Unix network programming by w richard stevens pdf download

2021.12.17 21:58

The hostnames usually describe the operating system OS as well. There are many details to consider in this line program. We mention them briefly here, in case this is your first encounter with a network program, and provide more information on these topics later in the text. Include our own header 1 We include our own header, unp.

This header includes numerous system headers that are needed by most network programs and defines various constants that we use e. Command-line arguments 2—3 This is the definition of the main function along with the command-line arguments.

The function returns a small integer descriptor that we can use to identify the socket in all future function calls e. The if statement contains a call to the socket function, an assignment of the return value to the variable named sockfd, and then a test of whether this assigned value is less than 0. The set of parentheses around the function call and assignment is required, given the precedence rules of C the less-than operator has a higher precedence than assignment.

As a matter of coding style, the authors always place a space between the two opening parentheses, as a visual indicator that the left-hand side of the comparison is also an assignment. This style is copied from the Minix source code [Tanen- baum ]. We use this same style in the while statement later in the program. In the preceding paragraph, we referred to a function named socket that is part of the sockets API. It prints our error message along with a description of the system error that occurred e.

We will describe them in Section D. It is derived from early Berkeley networking code. Never- theless, we use it throughout the text, instead of the ANSI C memset function, because bzero is easier to remember with only two arguments than memset with three arguments.

Almost every vendor that supports the sockets API also provides bzero, and if not, we pro- vide a macro definition of it in our unp.

Indeed, the author of TCPv3 made the mistake of swapping the second and third arguments to memset in 10 occurrences in the first printing. A C compiler cannot catch this error because both arguments are of the same type.

The call to memset still worked, but did nothing. The number of bytes to initialize was specified as 0. The programs still worked, because only a few of the socket functions actually require that the final 8 bytes of an Internet socket address structure be set to 0. Nevertheless, it was an error, and one that could be avoided by using bzero, because swapping the two arguments to bzero will always be caught by the C compiler if function prototypes are used.

It is new with IPv6 which we will talk more about in Appendix A. Do not worry if your system does not yet support this function; we will provide an implementation of it in Section 3. Establish connection with server 17—18 The connect function, when applied to a TCP socket, establishes a TCP connection with the server specified by the socket address structure pointed to by the second argu- ment.

In the unp. Everytime one of the socket functions requires a pointer to a socket address structure, that pointer must be cast to a pointer to a generic socket address structure.

We will talk more about generic socket address structures when explaining Figure 3. We must be careful when using TCP because it is a byte-stream protocol with no record boundaries. With a byte-stream protocol, these 26 bytes can be returned in numerous ways: a single TCP segment con- taining all 26 bytes of data, in 26 TCP segments each containing 1 byte of data, or any other combination that totals to 26 bytes.

Therefore, when reading from a TCP socket, we always need to code the read in a loop and terminate the loop when either read returns 0 i. In this example, the end of the record is being denoted by the server closing the con- nection. This technique is also used by version 1. Other techniques are available.

The important concept here is that TCP itself provides no record markers: If an application wants to delineate the ends of records, it must do so itself and there are a few common ways to accomplish this. Terminate program 26 exit terminates the program.

Unix always closes all open descriptors when a pro- cess terminates, so our TCP socket is now closed. As we mentioned, the text will go into much more detail on all the points we just described.

To modify the program to work under IPv6, we must change the code. Only five lines are changed, but what we now have is another protocol-dependent pro- gram; this time, it is dependent on IPv6. It is better to make a program protocol-independent. Figure Humans work better with names instead of numbers e. In Chapter 11, we will discuss the functions that convert between hostnames and IP addresses, and between service names and ports.

We purposely put off the discussion of these functions and continue using IP addresses and port numbers so we know exactly what goes into the socket address structures that we must fill in and examine. This also avoids complicating our discussion of network programming with the details of yet another set of functions.

In Figure 1. We find that most of the time, this is what we want to do. Occasionally, we want to do something other than terminate when one of these functions returns an error, as in Figure 5.

Since terminating on an error is the common case, we can shorten our programs by defining a wrapper function that performs the actual function call, tests the return value, and terminates on an error. Our wrapper function is shown in Figure 1. Whenever you encounter a function name in the text that begins with an uppercase letter, that is one of our wrapper functions. It calls a function whose name is the same but begins with the lowercase letter.

When describing the source code that is presented in the text, we always refer to the lowest level function being called e. While these wrapper functions might not seem like a big savings, when we discuss threads in Chapter 26, we will find that thread functions do not set the standard Unix errno variable when an error occurs; instead, the errno value is the return value of the function. With careful C coding, we could use macros instead of functions, providing a little run-time efficiency, but these wrapper functions are rarely the performance bottleneck of a program.

Our choice of capitalizing the first character of a function name is a compromise. Our style seems the least distracting while still providing a visual indication that some other function is really being called. This technique has the side benefit of checking for errors from functions whose error returns are often ignored: close and listen, for example. Throughout the rest of this book, we will use these wrapper functions unless we need to check for an explicit error and handle it in some way other than terminating the process.

We do not show the source code for all our wrapper functions, but the code is freely available see the Preface. The value of errno is set by a function only if an error occurs. Its value is unde- fined if the function does not return an error. No error has a value of 0.

Storing errno in a global variable does not work with multiple threads that share all global variables. We will talk about solutions to this problem in Chapter We use the wrapper functions that we described in the previous sec- tion and show this server in Figure 1. Later we will see how we can restrict the server to accepting a client connection on just a single interface.

Convert socket to listening socket 16 By calling listen, the socket is converted into a listening socket, on which incom- ing connections from clients will be accepted by the kernel. These three steps, socket, bind, and listen, are the normal steps for any TCP server to prepare what we call the listening descriptor listenfd in this example.

It specifies the maximum num- ber of client connections that the kernel will queue for this listening descriptor. We say much more about this queueing in Section 4. Accept client connection, send reply 17—21 Normally, the server process is put to sleep in the call to accept, waiting for a client connection to arrive and be accepted.

A TCP connection uses what is called a three-way handshake to establish a connection. When this handshake completes, accept returns, and the return value from the function is a new descriptor connfd that is called the connected descriptor.

This new descriptor is used for communication with the new client. A new descriptor is returned by accept for each client that connects to our server. Universal Time UTC. The next library function, ctime, converts this integer value into a human-readable string such as Mon May 26 A carriage return and linefeed are appended to the string by snprintf, and the result is written to the client by write. Calls to sprintf cannot check for overflow of the destination buffer.

Virtually all vendors provide it as part of the standard C library, and many freely available versions are also available. We use snprintf throughout the text, and we recommend using it instead of sprintf in all your programs for reliability. Other functions that we should be careful with are gets, strcat, and strcpy, normally calling fgets, strncat, and strncpy instead. Even better are the more recently available functions strlcat and strlcpy, which ensure the result is a properly terminated string.

Additional tips on writing secure network programs are found in Chapter 23 of [Garfinkel, Schwartz, and Spafford ]. Terminate connection 22 The server closes its connection with the client by calling close. As with the client in the previous section, we have only examined this server briefly, saving all the details for later in the book. Note the following points:. We will show a protocol-independent version that uses the getaddrinfo function in Fig- ure If multiple client connections arrive at about the same time, the kernel queues them, up to some limit, and returns them to accept one at a time.

This daytime server, which requires call- ing two library functions, time and ctime, is quite fast. But if the server took more time to service each client say a few seconds or a minute , we would need some way to overlap the service of one client with another client. The server that we show in Figure 1. There are numerous techniques for writ- ing a concurrent server, one that handles multiple clients at the same time.

The simplest technique for a concurrent server is to call the Unix fork function Sec- tion 4. This requires that we add code to the server to run correctly as a Unix daemon: a process that can run in the background, unattached to a terminal. We will cover this in Section Figure Page Description 1. Figure Page Description 5. This is a seven-layer model, which we show in Figure 1. We consider the bottom two layers of the OSI model as the device driver and net- working hardware that are supplied with the system.

Normally, we need not concern ourselves with these layers other than being aware of some properties of the datalink, such as the byte Ethernet maximum transfer unit MTU , which we describe in Section 2.

This is called a raw socket, and we will talk about this in Chapter The upper three layers of the OSI model are combined into a single layer called the application. With the Internet protocols, there is rarely any dis- tinction between the upper three layers of the OSI model.

We already mentioned raw sockets, and in Chapter 29 we will see that we can even bypass the IP layer completely to read and write our own datalink-layer frames. Why do sockets provide the interface from the upper three layers of the OSI model into the transport layer? There are two reasons for this design, which we note on the right side of Figure 1. First, the upper three layers handle all the details of the appli- cation FTP, Telnet, or HTTP, for example and know little about the communication details.

The lower four layers know little about the application, but handle all the com- munication details: sending data, waiting for acknowledgments, sequencing data that arrives out of order, calculating and verifying checksums, and so on. The second reason is that the upper three layers often form what is called a user process while the lower four layers are normally provided as part of the operating system OS kernel. Unix provides this separation between the user process and the kernel, as do many other con- temporary operating systems.

Therefore, the interface between layers 4 and 5 is the nat- ural place to build the API. A few changes to the sockets API also took place in with the 4.

The path down the figure from 4. Therefore, starting in , Berkeley provided the first of the BSD networking releases, which contained all the networking code and various other pieces of the BSD system that were not con- strained by the Unix source code license requirement.

The final releases from Berkeley were 4. More information on the various BSD releases, and on the history of the var- ious Unix systems in general, can be found in Chapter 1 of [McKusick et al. Many Unix systems started with some version of the BSD networking code, includ- ing the sockets API, and we refer to these implementations as Berkeley-derived implemen- tations.

Some of these versions have Berkeley-derived networking code e. We also note that Linux, a popular, freely available implementation of Unix, does not fit into the Berkeley-derived classification: Its networking code and sock- ets API were developed from scratch. For each host, we show the OS and the type of hardware since some of the operat- ing systems run on more than one type of hardware.

The name within each box is the hostname that appears in the text. FreeBSD 4. The topology shown in Figure 1. Instead, virtual private networks VPNs or secure shell SSH connections provide connectivity between these machines regardless of where they live physically. Section A. Discovering Network Topology We show the network topology in Figure 1. Although there are no current Unix stan- dards with regard to network configuration and administration, two basic commands are provided by most Unix systems and can be used to discover some details of a net- work: netstat and ifconfig.

Check the manual man pages for these commands on your system to see the details on the information that is output. We also specify the -n flag to print numeric addresses, instead of trying to find names for the networks.

This shows us the interfaces and their names. The loopback interface is called lo and the Ethernet is called eth0. The next exam- ple shows a host with IPv6 support. Note: We have wrapped some of the longer lines to align the output fields. We normally specify the -n flag to print numeric addresses. This also shows the IP address of the default router. Given the interface names, we execute ifconfig to obtain the details for each interface.

This shows the IP address, subnet mask, and broadcast address. Some implementa- tions provide a -a flag, which prints information on all configured interfaces. Their efforts have produced roughly 4, pages of specifications covering over 1, programming inter- faces [Josey ]. In this text, we will refer to this standard as simply The POSIX Specification, except in sections like this one where we are discussing specifics of various older standards.

The easiest way to acquire a copy of this consolidated standard is to either order it on CD-ROM or access it via the Web free of charge. Minimal changes were made from the to the version. This was an update to the The Three chapters on threads were added, along with additional sections on thread synchronization mutexes and condition vari- ables , thread scheduling, and synchronization scheduling.

Often, the rationale is as informative as the official standard. It is an international consortium of vendors and end-user customers from industry, govern- ment, and academia. The number of interfaces required by Unix 98 increases from 1, to 1,, although for a workstation this jumps to 3,, because it includes the Common Desktop Environment CDE , which in turn requires the X Window System and the Motif user interface.

Getting over 50 companies to agree on a single standard is certainly a landmark in the history of Unix. Historically, most Unix systems show either a Berkeley heritage or a System V her- itage, but these differences are slowly disappearing as most vendors adopt the stan- dards. The main differences still existing deal with system administration, one area that no standard currently addresses. Whenever possible we will use the standard functions. The Internet Engineering Task Force IETF is a large, open, international community of network designers, operators, vendors, and researchers concerned with the evolution of the Internet architecture and the smooth operation of the Internet.

It is open to any interested individual. Internet standards normally deal with protocol issues and not with programming APIs. These are informational RFCs, not standards, and were produced to speed the deployment of portable applications by the numerous vendors working on early releases of IPv6.

One reason is for larger addressing within a process i. The common program- ming model for existing bit Unix systems is called the ILP32 model, denoting that integers I , long integers L , and pointers P occupy 32 bits. The model that is becom- ing most prevalent for bit Unix systems is called the LP64 model, meaning only long integers L and pointers P require 64 bits.

From a programming perspective, the LP64 model means we cannot assume that a pointer can be stored in an integer. Some XTI structures also had members with a datatype of long e.

If these had been left as is, both would change from bit val- ues to bit values when a Unix system changes from the ILP32 to the LP64 model. In both instances, there is no need for a bit datatype: The length of a socket address structure is a few hundred bytes at most, and the use of long for the XTI structure members was a mistake.

The solution is to use datatypes designed specifically to handle these scenarios. The reason for not changing these values from 32 bits to 64 bits is to make it easier to provide binary compatibility on the new bit systems for applications compiled under bit systems. These two examples introduce many of the terms and concepts that are expanded on throughout the rest of the book.

Our client was protocol-dependent on IPv4 and we modified it to use IPv6 instead. But this just gave us another protocol-dependent program. In Chapter 11, we will develop some functions to let us write protocol-independent code, which will be impor- tant as the Internet starts using IPv6.

Throughout the text, we will use the wrapper functions developed in Section 1. Our wrapper functions all begin with a capital letter. Exercises 1. Compile and test the TCP daytime client in Figure 1. Run the program a few times, specifying a different IP address as the command-line argument each time. Compile and run the pro- gram.

What happens? Find the errno value corresponding to the error that is printed. How can you find more information on this error? Print the value of the counter before terminating. Compile and run your new client. Next, change the single call to write into a loop that calls write for each byte of the result string.

Compile this modified server and start it running in the background. Start this client, specifying the IP address of the host on which the modified server is running as the command-line argument. If possible, also try to run the client and server on different hosts. Our goal is to provide enough detail from a net- work programming perspective to understand how to use the protocols and provide references to more detailed descriptions of their actual design, implementation, and his- tory.

SCTP is a newer protocol, originally designed for transport of telephony signaling across the Internet. While it is possible to use IPv4 or IPv6 directly, bypassing the transport layer, this technique, often called raw sockets, is used much less frequently.

UDP is a simple, unreliable datagram protocol, while TCP is a sophisticated, reli- able byte-stream protocol.

SCTP is similar to TCP as a reliable transport protocol, but it also provides message boundaries, transport-level support for multihoming, and a way to minimize head-of-line blocking. We need to understand the services provided by these transport protocols to the application, so that we know what is handled by the protocol and what we must handle in the application. There are features of TCP that, when understood, make it easier for us to write robust clients and servers.

Also, when we understand these features, it becomes easier to debug our clients and servers using commonly provided tools such as netstat. Figure 2. We show both IPv4 and IPv6 in this figure. The next six applications use IPv4. Section 2.

We also note in Figure 2. We now describe each of the protocol boxes in this figure. IPv4 Internet Protocol version 4. IPv4, which we often denote as just IP, has been the workhorse protocol of the IP suite since the early s. It uses bit addresses Section A. IPv6 Internet Protocol version 6. IPv6 was designed in the mids as a replace- ment for IPv4. The major change is a larger address comprising bits Sec- tion A. TCP is a connection-oriented protocol that pro- vides a reliable, full-duplex byte stream to its users.

TCP sockets are an example of stream sockets. TCP takes care of details such as acknowledg- ments, timeouts, retransmissions, and the like. Most Internet application pro- grams use TCP. There is no guarantee that UDP data- grams ever reach their intended destination.

SCTP is a connection-oriented protocol that provides a reliable full-duplex association. SCTP provides a message service, which maintains record boundaries.

ICMP handles error and control information between routers and hosts. It is sometimes used when a diskless node is booting. This interface provides access to the datalink layer. It is nor- mally found on Berkeley-derived kernels. DLPI Datalink provider interface.

This interface also provides access to the datalink layer. It is normally provided with SVR4. Each Internet protocol is defined by one or more documents called a Request for Comments RFC , which are their formal specifications. The solution to Exercise 2. The 4. It is described in RFC [Postel ]. The application writes a message to a UDP socket, which is then encapsulated in a UDP datagram, which is then further encapsulated as an IP datagram, which is then sent to its destination.

There is no guarantee that a UDP datagram will ever reach its final destina- tion, that order will be preserved across the network, or that datagrams arrive only once. The problem that we encounter with network programming using UDP is its lack of reliability. If a datagram reaches its final destination but the checksum detects an error, or if the datagram is dropped in the network, it is not delivered to the UDP socket and is not automatically retransmitted.

If we want to be certain that a datagram reaches its destination, we can build lots of features into our application: acknowledgments from the other end, timeouts, retransmissions, and the like.

Each UDP datagram has a length. The length of a datagram is passed to the receiv- ing application along with the data. We have already mentioned that TCP is a byte-stream protocol, without any record boundaries at all Section 1.

We also say that UDP provides a connectionless service, as there need not be any long-term relationship between a UDP client and server. For example, a UDP client can create a socket and send a datagram to a given server and then immediately send another datagram on the same socket to a different server. First, TCP provides connections between clients and servers.

A TCP client establishes a con- nection with a given server, exchanges data with that server across the connection, and then terminates the connection. TCP also provides reliability. When TCP sends data to the other end, it requires an acknowledgment in return. If an acknowledgment is not received, TCP automatically retransmits the data and waits a longer amount of time.

After some number of retrans- missions, TCP will give up, with the total amount of time spent trying to send data typi- cally between 4 and 10 minutes depending on the implementation. Note that TCP does not guarantee that the data will be received by the other endpoint, as this is impossible. It delivers data to the other endpoint if possible, and notifies the user by giving up on retransmissions and breaking the connection if it is not possible. TCP contains algorithms to estimate the round-trip time RTT between a client and server dynamically so that it knows how long to wait for an acknowledgment.

TCP also sequences the data by associating a sequence number with every byte that it sends. For example, assume an application writes 2, bytes to a TCP socket, caus- ing TCP to send two segments, the first containing the data with sequence numbers 1 — 1, and the second containing the data with sequence numbers 1, — 2, If the segments arrive out of order, the receiving TCP will reorder the two segments based on their sequence numbers before passing the data to the receiving application.

There is no reliability provided by UDP. Topics covered: Perl function libraries and techniques that allow programs to interact with resources over a network. As networks, devices, and systems continue to evolve, software engineers face the unique challenge of creating reliable distributed applications within frequently changing environments. This book guides software professionals through the traps and pitfalls of developing efficient, portable, and flexible networked applications.

It explores the inherent design complexities of concurrent networked applications and the tradeoffs that must be considered when working to master them. The book then provides the essential design dimensions, patterns, and principles needed to develop flexible and efficient concurrent networked applications.

The Art of UNIX Programming poses the belief that understanding the unwritten UNIX engineering tradition and mastering its design patterns will help programmers of all stripes to become better programmers. This book attempts to capture the engineering wisdom and design philosophy of the UNIX, Linux, and Open Source software development community as it has evolved over the past three decades, and as it is applied today by the most experienced programmers. A "coder's book", this title tells how to use Pthreads in the real world, making efficient and portable applications.

Pthreads are an important set of current tools programmers need to have in today's network-intensive climate. Thanks to the ongoing efforts of thousands of Linux developers, Linux is more ready than ever for deployment at the frontlines of the real world. The authors of this book know that terrain well, and I am happy to leave you in their most capable hands. Unique and highly recommended.

The authors spell out detailed best practices for every facet of system administration, including storage management, network design and administration, web hosting, software configuration management, performance analysis, Windows interoperability, and much more. Sysadmins will especially appreciate the thorough and up-to-date discussions of such difficult topics such as DNS, LDAP, security, and the management of IT service organizations. They explain complex tasks in detail and illustrate these tasks with examples drawn from their extensive hands-on experience.

This book contains everything you need to make your application program support IPv6. You'll find descriptions of over system calls and library functions, and more than example programs, 88 tables, and diagrams. The Linux Programming Interface is the most comprehensive single-volume work on the Linux and UNIX programming interface, and a book that's destined to become a new classic. Designing application and middleware software to run in concurrent and networked environments is a significant challenge to software developers.

The patterns catalogued in this second volume of Pattern-Oriented Software Architectures POSA form the basis of a pattern language that addresses issues associated with concurrency and networking. The book presents 17 interrelated patterns ranging from idioms through architectural designs. They cover core elements of building concurrent and network systems: service access and configuration, event handling, synchronization, and concurrency.

The book can be used to tackle specific software development problems or read from cover to cover to provide a fundamental understanding of the best practices for constructing concurrent and networked applications and middleware. Visit our Web Page. A hands-on guide to writing a Message Passing Interface, this book takes the reader on a tour across major MPI implementations, best optimization techniques, application relevant usage hints, and a historical retrospective of the MPI world, all based on a quarter of a century spent inside MPI.

Readers will learn to write MPI implementations from scratch, and to design and optimize communication mechanisms using pragmatic subsetting as the guiding principle. Alexander holds 26 patents more pending worldwide. There isn't a more practical or up-to-date bookothis volume is the only one to cover the de facto standard implementation from the 4.

You will learn about such topics as the relationship between the sockets API and the protocol suite, and the differences between a host implementation and a router. In addition, the book covers the newest features of the 4. Richard Stevens presents a comprehensive guide to every form of IPC, including message passing, synchronization, shared memory, and Remote Procedure Unix network Stevens, W. Richard Stevens.

John Riley's Ownd

0コメント

1000 / 1000