20.1 Introduction to Connectionless Communication
Connectionless communication is an abstraction based on transmission of single messages or datagrams between sender and receiver. A datagram is a unit of data transferred from one endpoint to another. Connectionless communication makes no association between the endpoints, and a process can use a single connectionless endpoint to send messages to or receive messages from many other endpoints.
Figure 20.1 illustrates connectionless communication among four processes running on different hosts. Process A receives messages from several different sources on the same communication endpoint. Process A uses this communication endpoint both to reply to the message from C and to send a message to D. Process C uses its connectionless communication endpoint to send messages to both A and D. Since each message includes the sender's return address, the receiver knows where to send the response.
This chapter develops a model for connectionless communication based on UDP, the User Datagram Protocol. UDP is used in many common Internet applications and protocols, including DNS (name service), NFS (distributed file system), NTP (time protocol), RTP (realtime transfer protocol) and SNMP (network management).
A UDP communication endpoint is identified by host IP address and port number. The receiver can extract the address of the sender's communication endpoint and use the information as a return address in replying to the message. Because no connection is involved, connectionless communication might not follow the client-server model. However, the client-server communication pattern holds for many applications, with clients sending request messages to servers on well-known ports (e.g., NFS servers use port 2049).
While the connection-oriented TCP protocol provides an error-free byte stream, UDP is unreliable. A UDP datagram might not arrive at its destination, or it might arrive before a message that was sent earlier. The sender has no information about the success or failure of the transmission. Even if the datagram arrives at the destination host, the network subsystem might drop the message before delivering it to the application because the endpoint buffers are full. Thus, while UDP has very low overhead, the application must handle considerably more complex errors than with TCP.
UDP datagrams are transmitted atomically, that is, a given datagram either arrives in its entirety at the destination endpoint or it does not arrive at all. To achieve this, modern network subsystems assemble UDP datagrams and verify UDP checksums. If a checksum is not correct, the subsystem discards the packet. Unfortunately, the computation of UDP checksums in IPv4 is optional, and some older systems disable checking by default. The UDP checksum guards against transmission errors, but not against malicious attackers. Such an attacker could modify both the data and the checksum in a consistent way. UDP does not authenticate what was sent and so has no way of detecting that an attack has occurred. Authentication must take place in a higher-level layer or in the application itself.
As with connection-oriented protocols, we introduce a simplified interface for connectionless communication, based on a socket implementation with UDP. Section 20.2 describes the UICI UDP interface. Sections 20.3 and 20.4 use this interface to implement the simple-request and the request-reply protocols, respectively. Section 20.5 adds timeouts and retries to the request-reply protocol. Section 20.6 outlines the implementation of request-reply-acknowledge protocols. Section 20.7 describes the implementation of each function in the UICI UDP interface in terms of sockets and UDP. Section 20.8 compares the UDP and TCP protocols. Section 20.9 discusses multicast communication and adds two functions to UICI UDP to support multicast communication.
20.2 Simplified Interface for Connectionless Communication
Connectionless communication using UDP is based on the sendto and recvfrom functions. The UICI UDP connectionless communication interface has u_sendto, u_sendtohost, u_recvfrom and u_recvfromtimed that provide the same functionality, but with simpler parameters. Also, unlike the underlying UDP functions, the UICI UDP functions restart themselves after being interrupted by signals. Table 20.1 summarizes the UICI UDP interface to connectionless communication. To use these functions, you must compile your programs with both the UICI name and the UICI UDP libraries. Include both uiciname.h and uiciudp.h in your source files. Section 20.2.2 discusses error handling with the UICI UDP functions.
Table 20.1. Summary of UICI UDP calls.|
int u_openudp(u_port_t port)
| creates a UDP socket and if port > 0, binds socket to port returns the socket file descriptor |
ssize_t u_recvfrom(int fd,
void *buf, size_t nbytes,
u_buf_t *ubufp)
| waits for up to nbytes from socket fd returns number of bytes received on return buf has received bytes and ubufp points to sender address |
ssize_t u_recvfromtimed(int fd,
void *buf, size_t nbytes,
u_buf_t *ubufp, double time)
| waits at most time seconds for up to nbytes from socket fd returns the number of bytes received on return buf has received bytes and ubufp points to sender address |
ssize_t u_sendto(int fd, void *buf,
size_t nbytes,
u_buf_t *ubufp)
| sends nbytes of buf on socket fd to the receiver specified by ubufp returns number of bytes actually sent |
ssize_t u_sendtohost(int fd,
void *buf, size_t nbytes,
char *hostn, u_port_t port)
| sends nbytes of buf on socket fd to receiver specified by hostn and port returns number of bytes actually sent |
void u_gethostname(u_buf_t *ubufp,
char *hostn, int hostnsize)
| copies host name specified by ubufp into buffer hostn of size hostnsize |
void u_gethostinfo(u_buf_t *ubufp,
char *info, inf infosize)
| copies printable string containing host name and port specified by ubufp into user-supplied buffer info of size infosize. |
int u_comparehost(u_buf_t *ubufp,
char *hostn, u_port_t port)
| returns 1 if host and port specified by ubufp match given host name and port number, or else returns 0 |
The u_openudp function returns a file descriptor that is a handle to a UDP socket. This function takes a single integer parameter, port, specifying the port number to bind to. If port is zero, the socket does not bind to a port. Typically, a server binds to a port and a client does not.
The u_recvfrom function reads up to nbytes from the file descriptor fd into the user-provided buffer buf and returns the number of bytes read. The u_recvfrom function fills in the user-supplied u_buf_t structure pointed to by ubufp with the address of the sender.
The u_recvfromtimed function is similar to u_recvfrom, but it takes an additional time parameter that specifies the number of seconds that u_recvfromtimed should wait for a message before returning with an error. The time parameter is a double, allowing fine-grained time values. Because messages may be lost, robust receivers call u_recvfromtimed to avoid blocking indefinitely.
The u_sendto function transmits nbytes from buf through the socket fd to the destination pointed to by ubufp. The u_sendto function requires a destination parameter because the communication endpoint is capable of sending to any host or receiving from any host. Use a u_buf_t value set by u_recvfrom to respond to a particular sender.
The u_sendtohost function is similar to u_sendto, but it requires a host name and port number rather than a pointer to a u_buf_t structure to specify the destination. Clients use u_sendtohost to initiate a communication with a server on a well-known port.
20.2.1 Host names and the u_buf_t structure
To be implementation-independent, applications that use UICI UDP should treat u_buf_t objects as opaque and use them in u_sendto without parsing. Appendix C provides an implementation of UICI UDP with IPv4, but it is also possible to implement UICI UDP with IPv6. The u_buf_t structure would be different for the two implementations. Three UICI UDP functions provide access to the information in the u_buf_t structure in an implementation-independent way. The u_gethostname function returns the host name encoded in a u_buf_t structure. The u_gethostinfo function returns a printable string containing a u_buf_t structure's information about host name and port number and can be used for debugging. The u_comparehost function returns 1 if the information in u_buf_t matches the specified host name and port number. Use u_comparehost to verify the identity of a sender.
20.2.2 UICI UDP return errors
The u_gethostname and u_gethostinfo functions return information in user-supplied buffers and cannot return an error code. The u_comparehost function returns 1 (true) if the hosts and ports match and 0 (false) if they do not. The other UICI UDP functions return 1 on error and set errno. If u_recvfromtimed times out, it sets errno to ETIME. If u_sendtohost cannot resolve the host name, it sets errno to EINVAL. Other errno settings match the underlying socket settings, as explained in Section 20.7. When a UICI UDP function returns an error and sets errno, you can use perror or strerror to display an appropriate error message, as long as you take into account these functions' lack of thread-safety.
20.2.3 UDP buffer size and UICI UDP
Messages sent under UDP are received atomically, meaning that a message sent with u_sendto or u_sendtohost is either transmitted entirely or not at all. A given implementation of UDP has a maximum message size. If you attempt to send a message that is too large, u_sendto or u_sendtohost returns 1 and sets errno to EMSGSIZE.
The u_recvfrom function reads exactly one message. If the message is smaller than nbytes, u_recvfrom returns the number of bytes actually read and its buf contains the entire message. If the message is larger than nbytes, u_recvfrom fills buf and truncates the message. In this case, u_recvfrom does not generate an error and returns the number of bytes put in the buffer (e.g., the size of the buffer).
Care must be taken to ensure that the receive buffer is large enough for the message, since UICI UDP truncates the message rather than generating an error when the buffer is too small. One way to handle this is to make the buffer one byte larger than the size expected and have the calling program generate an error if the buffer is completely filled.
Each UDP datagram is passed to the lower layers of the network protocol and encapsulated as a packet (header + data) in an IP datagram for transmission on the network. The network also imposes size limitations that affect transmission of datagrams. Each link in a path on the network has an MTU (maximum transmission unit), the largest chunk of information that a link can transmit. A datagram may be broken up into pieces (fragments) so that it can be physically transmitted along a link. These fragments are reassembled only when they reach the destination host. If any fragment is missing, the entire datagram is lost. While most UDP implementations allow datagrams of 8192 bytes, the typical network link has an MTU considerably smaller (e.g., 1500 bytes for Ethernet). As of this writing, most hosts and routers on the Internet use the IPv4 protocol for exchanging information. Under IPv4, hosts are not required to receive IP datagrams larger than 576 bytes, so many applications that use UDP limit their message size to fit in a datagram of this size, i.e., 576 20(IP header) 8(UDP header) = 548 bytes.
Exercise 20.1
How would you modify u_recvfrom so that it detects messages that are too large for the buffer?
Answer:
Use malloc to modify u_recvfrom to accommodate a buffer size one byte larger than the buffer passed in. Receive the message into this larger buffer. If the number of bytes received is equal to this buffer size, u_recvfrom should return 1 and set errno to an appropriate value. One possible value to use is EMSGSIZE. Otherwise, u_recvfrom should copy the message into buf, the buffer that was passed as a parameter by the caller. In either case, u_recvfrom must free the temporary buffer.
|
20.3 Simple-Request Protocols
A protocol is a set of rules that endpoints follow when they communicate. Simple request [110] is a client-server protocol in which a client sends a request to the server but expects no reply. Figure 20.2 shows a schematic of the steps involved in implementing a simple-request protocol using UICI UDP.
Programs 20.1 and 20.2 illustrate the simple-request protocol. The server creates a UDP socket associated with a well-known port (u_openudp) and then waits for a request from any sender (u_recvfrom). The server blocks on u_recvfrom until receiving a message. The server responds by writing the remote host name and received message to standard output and then waits in a loop for another message.
Exercise 20.2
Under what conditions does the server of Program 20.1 exit?
Answer:
The server exits if it is given the wrong number of command-line arguments or if u_openudp fails. After that, the server will not exit unless it receives a signal. No transmission by a client can cause the server to exit.
Program 20.1 server_udp.c
A server program writes sender information and the received message to its standard output.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include "restart.h"
#include "uiciudp.h"
#define BUFSIZE 1024
int main(int argc, char *argv[]) {
char buf[BUFSIZE];
ssize_t bytesread;
char hostinfo[BUFSIZE];
u_port_t port;
int requestfd;
u_buf_t senderinfo;
if (argc != 2) {
fprintf(stderr, "Usage: %s port\n", argv[0]);
return 1;
}
port = (u_port_t) atoi(argv[1]); /* create communication endpoint */
if ((requestfd = u_openudp(port)) == -1) {
perror("Failed to create UDP endpoint");
return 1;
}
for ( ; ; ) { /* process client requests */
bytesread = u_recvfrom(requestfd, buf, BUFSIZE, &senderinfo);
if (bytesread < 0) {
perror("Failed to receive request");
continue;
}
u_gethostinfo(&senderinfo, hostinfo, BUFSIZE);
if ((r_write(STDOUT_FILENO, hostinfo, strlen(hostinfo)) == -1) ||
(r_write(STDOUT_FILENO, buf, bytesread) == -1)) {
perror("Failed to echo reply to standard output");
}
}
}
The client of Program 20.2 creates a UDP socket by calling u_openudp with a parameter of 0. In this case, u_openudp does not bind the socket to a port. The client initiates a request by calling u_sendtohost, specifying the host name and the well-known port of the server. Since the client has not bound its socket to a port, the first send on the socket causes the network subsystem to assign a private port number, called an ephemeral port, to the socket. The client of Program 20.2 sends a single request and then calls r_close to release the resources associated with the communication endpoint. Notice that the server does not detect an error or end-of-file when the client closes its socket, because there is no connection between the endpoints in the two applications.
Program 20.2 client_udp.c
A client program that sends a request containing its process ID.
#include <limits.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include "restart.h"
#include "uiciudp.h"
#define BUFSIZE 1024
int main(int argc, char *argv[]) {
ssize_t byteswritten;
char request[BUFSIZE];
int requestfd;
int rlen;
u_port_t serverport;
if (argc != 3) {
fprintf(stderr, "Usage: %s servername serverport\n", argv[0]);
return 1;
}
serverport = (u_port_t) atoi(argv[2]);
if ((requestfd = u_openudp(0)) == -1) { /* create unbound UDP endpoint */
perror("Failed to create UDP endpoint");
return 1;
}
sprintf(request, "[%ld]\n", (long)getpid()); /* create a request */
rlen = strlen(request);
/* use simple-request protocol to send a request to (server, serverport) */
byteswritten = u_sendtohost(requestfd, request, rlen, argv[1], serverport);
if (byteswritten == -1)
perror("Failed to send");
if (r_close(requestfd) == -1 || byteswritten == -1)
return 1;
return 0;
}
Exercise 20.3
Compile Programs 20.1 and 20.2. Start the server on one machine (say, yourhost) with the following command.
server_udp 20001
Run clients on different hosts by executing the following on several machines.
client_udp yourhost 20001
Observe the assignment of ephemeral port numbers. What output does the server produce? How about the clients?
Answer:
Ephemeral ports are assigned in a system-dependent way. If all goes well, the clients do not produce output. For each message sent by a client, the server produces a line of output. If a client with process ID 2345 runs on machine myhost and uses ephemeral port 56525, the following message appears on standard output of the server.
port number is 56525 on host myhost[2345]
Figure 20.3 uses a time line to depict a sequence of events produced by the simple-request protocol. The diagram assumes that the client and the server have created their communication endpoints before the time line starts. Black dots represent event times relative to the same clock. For functions, the dots indicate the times at which the function returns to the caller. Remember that the clock times observed by the client and server are usually not synchronized unless the client and server are on the same machine.
The u_sendtohost function is nonblocking in the sense that it returns after copying the message to the network subsystem of the local machine. The u_recvfrom function blocks until it receives a message or an error occurs. The u_recvfrom function restarts itself after receiving a signal, in contrast to the underlying library function recvfrom, as explained in Section 20.7.
Exercise 20.4
Run Program 20.2 without starting the corresponding server. What happens?
Answer:
UDP does not determine whether the receiver host and its server program exist, so the client cannot detect whether the server has errors. A client generates an error only if it cannot resolve the server host name.
Exercise 20.5
Figure 20.3 assumes that the server has been started before the client and is ready to receive when the message arrives. What happens if the client's message arrives before the server has created its communication endpoint? What happens if the client's message arrives after the server has created its endpoint but before it has called u_recvfrom?
Answer:
If the client's message arrives before the server has created its endpoint, the message is lost. In the second case, the result depends on how much buffer space has been allocated for the endpoint and how many messages have already arrived for that endpoint. If the endpoint's buffer has room, the network subsystem of the server host stores the message in the endpoint's buffer. The server calls u_recvfrom to remove the message. Communication is an asynchronous process, and a major role of the communication endpoint is for the network and I/O subsystems to provide buffering for incoming messages until user processes are ready for them.
Exercise 20.6
Modify the client in Program 20.2 to send 1000 requests, and modify the server in Program 20.1 to sleep for 10 seconds between the u_openudp call and the while loop. Start the server and immediately start the client. How many messages are received by the server?
Answer:
The answer depends on the size of the endpoint buffers. You might see about 100 messages delivered. If all of the messages are delivered, try increasing the number of messages sent by the client to 10,000.
Figure 20.3 illustrates the ideal scenario, in which the client's message successfully arrives at the server and is processed. In reality, today's network infrastructure provides no guarantee that all messages actually arrive. Figure 20.4 illustrates a scenario in which the message is lost because of a network error. The server has no knowledge of the message's existence.
Exercise 20.7
Draw a timing diagram similar to those of Figures 20.3 and 20.4 that illustrates a scenario in which the server receives a client request and then crashes before processing the request.
Answer:
Relabel the second event dot on the server's time line in Figure 20.3 as a crash event.
|
20.4 Request-Reply Protocols
In the simple-request protocol, the client cannot distinguish the scenario of Figure 20.3 from those of Figure 20.4 and Exercise 20.7 because it does not receive an acknowledgment of its request or any results produced by the request. A request-reply protocol handles this problem by requiring that the server respond to the client. Figure 20.5 shows a sequence of steps, using UICI UDP, to implement a simplified request-reply protocol. If no errors occur, the server's reply message notifies the client that the transmission was successful. The server reply message can contain actual results or just a flag reporting the status of the request.
Program 20.3 shows the server-side implementation of the request-reply protocol of Figure 20.5. The server receives a request and uses u_gethostinfo to extract the identity of the client. After printing the client's name and request to STDOUT_FILENO, the server uses u_sendto with the u_buf_t structure (senderinfo) returned from u_recvfrom to respond to that client. The UICI UDP u_sendto function uses the u_buf_t structure as the destination address to ensure that the reply is directed to the correct client. The server shown here replies with a copy of the request it received.
Exercise 20.8
An important consideration in writing a server is to decide which conditions should cause the server to exit, which conditions should be ignored, which conditions should be logged and which conditions should trigger a recovery procedure. The server of Program 20.3 never exits on its own once its port is bound to the socket. You can terminate the server by sending it a signal. Under what conditions would it be reasonable for a server such as an ftp server to exit?
Answer:
You could argue that an ftp server should never exit because it should be running at all times. Certainly, an error caused by a client should not terminate the server. Even if system resources are not available to handle a connection, the problem might be temporary and the server would continue to work after the problem is resolved. Errors should be logged so the administrator has a record of any problems.
Program 20.3 server_udp_request_reply.c
A server program that implements a request-reply protocol.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include "restart.h"
#include "uiciudp.h"
#define BUFSIZE 1024
int main(int argc, char *argv[]) {
char buf[BUFSIZE];
ssize_t bytesread;
char hostinfo[BUFSIZE];
u_port_t port;
int requestfd;
u_buf_t senderinfo;
if (argc != 2) {
fprintf(stderr, "Usage: %s port\n", argv[0]);
return 1;
}
port = (u_port_t) atoi(argv[1]); /* create UDP endpoint for port */
if ((requestfd = u_openudp(port)) == -1) {
perror("Failed to create UDP endpoint");
return 1;
}
for ( ; ; ) { /* process client requests and send replies */
bytesread = u_recvfrom(requestfd, buf, BUFSIZE, &senderinfo);
if (bytesread == -1) {
perror("Failed to receive client request");
continue;
}
u_gethostinfo(&senderinfo, hostinfo, BUFSIZE);
if ((r_write(STDOUT_FILENO, hostinfo, strlen(hostinfo)) == -1) ||
(r_write(STDOUT_FILENO, buf, bytesread) == -1)) {
perror("Failed to echo client request to standard output");
}
if (u_sendto(requestfd, buf, bytesread, &senderinfo) == -1) {
perror("Failed to send the reply to the client");
}
}
}
Program 20.4 shows a client that uses the request-reply protocol of Figure 20.5. The request is just a string containing the process ID of the requesting process. The protocol is implemented in the request_reply function shown in Program 20.5. The client sends the initial request and then waits for the reply. Since anyone can send a message to an open port, the client checks the host/port information against the sender information supplied in senderinfo to make sure that it received the reply from the same host that it sent to.
Program 20.4 client_udp_request_reply.c
A client program that sends a request containing its process ID and reads the reply.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include "restart.h"
#include "uiciudp.h"
#define BUFSIZE 1024
int request_reply(int requestfd, void* request, int reqlen,
char* server, int serverport, void *reply, int replen);
int main(int argc, char *argv[]) {
ssize_t bytesread, byteswritten;
char reply[BUFSIZE];
char request[BUFSIZE];
int requestfd;
u_port_t serverport;
if (argc != 3) {
fprintf(stderr, "Usage: %s servername serverport\n", argv[0]);
return 1;
}
serverport = (u_port_t) atoi(argv[2]);
if ((requestfd = u_openudp(0)) == -1) { /* create unbound UDP endpoint */
perror("Failed to create UDP endpoint");
return 1;
}
sprintf(request, "[%ld]\n", (long)getpid()); /* create a request */
/* use request-reply protocol to send a message */
bytesread = request_reply(requestfd, request, strlen(request)+1,
argv[1], serverport, reply, BUFSIZE);
if (bytesread == -1)
perror("Failed to do request_reply");
else {
byteswritten = r_write(STDOUT_FILENO, reply, bytesread);
if (byteswritten == -1)
perror("Failed to echo server reply");
}
if ((r_close(requestfd) == -1) || (bytesread == -1) || (byteswritten == -1))
return 1;
return 0;
}
Exercise 20.9
What happens when the scenario of Figure 20.4 occurs for the request-reply protocol of Figure 20.5?
Answer:
The client hangs indefinitely on the blocking u_recvfrom call.
Program 20.5 request_reply.c
Request-reply implementation Aassumes error-free delivery.
#include <sys/types.h>
#include "uiciudp.h"
int request_reply(int requestfd, void* request, int reqlen,
char* server, int serverport, void *reply, int replen) {
ssize_t nbytes;
u_buf_t senderinfo;
/* send the request */
nbytes = u_sendtohost(requestfd, request, reqlen, server, serverport);
if (nbytes == -1)
return (int)nbytes;
/* wait for a response, restart if from wrong server */
while ((nbytes = u_recvfrom(requestfd, reply, replen, &senderinfo)) >= 0 )
if (u_comparehost(&senderinfo, server, serverport)) /* sender match */
break;
return (int)nbytes;
}
Exercise 20.10
Compile Programs 20.3 and 20.4. Start the server on one machine (say, yourhost) with the following command.
server_udp_request_reply 20001
Run clients on different hosts by executing the following on several machines.
client_udp_request_reply yourhost 20001
Put timing statements in Program 20.4 to measure how long it takes for the client to send a request and receive a response. (See Example 9.8.) Run the client program several times. Do any of the instances hang? Under what circumstances would you expect the client to hang?
Answer:
The client blocks indefinitely on u_recvfrom if it does not receive the reply from the server. Modern networks have become so reliable that if the client and server are running on the same local area network (LAN), it is unlikely that either the request or the reply messages will be lost because of errors along particular wires. In high-congestion situations, packets may be dropped at LAN switches. If many clients are making simultaneous requests, the network subsystem of the server host might discard some packets because the communication endpoint's buffers are full. Messages from clients and servers on different LANs generally follow paths consisting of many links connected by routers. Congested routers drop messages that they can't handle, increasing the likelihood that a message is not delivered.
Exercise 20.11
Figure 20.6 illustrates the timing for the request-reply protocol when there are no errors. When errors are possible, the nine events listed in the following table can occur in various orders.
|
A | client sends request message | B | server receives request message | C | server processes request | D | server sends reply message | E | client receives reply message | F | request message is lost | G | reply message is lost | H | client crashes | I | server crashes |
The event sequence ABCDE represents the scenario of Figure 20.6. For the five event sequences listed below, state whether each represents a physically realizable scenario. If the scenario is realizable, explain the outcome and draw a timing diagram similar to that shown in Figure 20.6. If the scenario is not realizable, explain why.
ABCED ABCDG ABCI ABCGD ABCDIE
What other event sequences represent possible scenarios for request-reply?
Answer:
ABCED is not realizable, since the client cannot receive a message before the server sends it. This assumes that no other process on the server host has guessed the ephemeral port number used by the client and sent a bogus reply. It also assumes that another host has not spoofed the IP address of the server. We do not consider these scenarios here. ABCDG is realizable and represents a situation in which the client does not receive a response even though the server has processed the request. ABCI is realizable and represents a situation in which the server receives the request and processes it but crashes before it sends the response. ABCGD is not realizable, since a message cannot be lost before it is sent. ABCDIE is possible. If the server crashes after it sends the reply, the reply can still be received.
Many other event sequences represent realizable scenarios.
|
20.5 Request-Reply with Timeouts and Retries
The client of Program 20.4 can hang indefinitely if either the request message or the reply message is lost or if the server crashes. The client can use timeouts to handle these potential deadlocks. Before making a blocking call, the process sets a timer that generates a signal to interrupt the call after a certain length of time. If the interrupt occurs, the process can try again or use a different strategy.
You can implement a timeout directly by setting a software timer or by using timeout facilities included as options to calls such as select. Sockets themselves have some options for setting timeouts. Section 20.7 discusses the pros and cons of different timeout strategies.
The u_recvfromtimed function of UICI UDP provides a simple interface to these timeout facilities. The u_recvfromtimed function is similar to u_recvfrom, but it takes an additional double parameter, time, indicating the number of seconds to block, waiting for a response. After blocking for time seconds without receiving a response on the specified endpoint, u_recvfromtimed returns 1 and sets errno to ETIME. For other errors, u_recvfromtimed returns 1 and sets the errno as u_recvfrom does.
Program 20.6 modifies Program 20.4 to call the function request_reply_timeout, shown in Program 20.7, instead of calling request_reply. A third command-line argument to this program specifies the number of seconds to wait before timing out.
Program 20.6 client_udp_request_reply_timeout.c
A client program that uses timeouts with request-reply.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include "restart.h"
#include "uiciudp.h"
#define BUFSIZE 1024
int request_reply_timeout(int requestfd, void* request, int reqlen,
char* server, int serverport, void *reply, int replen,
double timeout);
int main(int argc, char *argv[]) {
ssize_t bytesread, byteswritten;
char reply[BUFSIZE];
char request[BUFSIZE];
int requestfd;
u_port_t serverport;
double timeout;
if (argc != 4) {
fprintf(stderr, "Usage: %s servername serverport timeout\n", argv[0]);
return 1;
}
serverport = (u_port_t) atoi(argv[2]);
timeout = atof(argv[3]);
if ((requestfd = u_openudp(0)) == -1) { /* create unbound UDP endpoint */
perror("Failed to create UDP endpoint");
return 1;
}
sprintf(request, "[%ld]\n", (long)getpid()); /* create a request string */
/* use request-reply protocol with timeout to send a message */
bytesread = request_reply_timeout(requestfd, request, strlen(request) + 1,
argv[1], serverport, reply, BUFSIZE, timeout);
if (bytesread == -1)
perror("Failed to complete request_reply_timeout");
else {
byteswritten = r_write(STDOUT_FILENO, reply, bytesread);
if (byteswritten == -1)
perror("Failed to echo server reply");
}
if ((r_close(requestfd) == -1) || (bytesread == -1) || (byteswritten == -1))
return 1;
return 0;
}
Program 20.7 request_reply_timeout.c
Request-reply implementation with timeout.
#include <sys/types.h>
#include "uiciudp.h"
int request_reply_timeout(int requestfd, void* request, int reqlen,
char* server, int serverport, void *reply, int replen,
double timeout) {
ssize_t nbytes;
u_buf_t senderinfo;
/* send the request */
nbytes = u_sendtohost(requestfd, request, reqlen, server, serverport);
if (nbytes == -1)
return -1;
/* wait timeout seconds for a response, restart if from wrong server */
while ((nbytes = u_recvfromtimed(requestfd, reply, replen,
&senderinfo, timeout)) >= 0 &&
(u_comparehost(&senderinfo, server, serverport) == 0)) ;
return (int)nbytes;
}
Figure 20.7 shows a state diagram for the request-reply logic of Program 20.7. The circles represent functions calls that implement the major steps in the protocol, and the arrows indicate outcomes.
The request_reply_timeout function of Program 20.7 returns an error if the server does not respond after an interval of time. Either the request was not serviced or it was serviced and the reply was lost or never sent. The client cannot distinguish between a lost message and a server crash.
Another potential problem is that Program 20.7 resets the timeout each time it encounters an incorrect responder. In a denial-of-service attack, offenders continually send spurious packets to ports on the attacked machine. Program 20.7 should limit the number of retries before taking some alternative action such as informing the user of a potential problem.
Exercise 20.12
Request-reply protocols can also be implemented over TCP. Why are these implementations usually simpler than UDP implementations? Are there disadvantages to a TCP implementation?
Answer:
Since TCP provides an error-free stream of bytes, the application can use the error-free request-reply protocol shown in Figure 20.5. Another advantage of TCP implementations is that the client has a connection to the server and can signal that it is finished by closing this connection. The server can then release resources that it has allocated to servicing that client's requests. The client can also detect a server crash while it is waiting for a reply. On the downside, TCP implementations incur overhead in setting up the connection.
Usually, implementations of request-reply with timeout have a mechanism for retrying the request a certain number of times before giving up. The state diagram of Figure 20.8 summarizes this approach. The user specifies a maximum number of retries. The application retries the entire request-reply sequence each time a timeout occurs until the number of retries exceeds the specified maximum.
Program 20.8 implements the request-reply protocol of Figure 20.8 for use in a client similar to Program 20.6.
Exercise 20.13
How would the client in Program 20.6 need to be modified to use the protocol in Program 20.8?
Answer:
The client would have to take an extra command-line argument for the number of retries and call request_reply_timeout_retry instead of request_reply_timeout.
Exercise 20.14
Propose a more sophisticated method of handling timeouts than that of Program 20.8. How might a potential infinite loop due to wrong host be handled?
Answer:
The selection of a timeout value is somewhat arbitrary. If the timeout value is large, the application may wait too long before recognizing a problem. However, timeout values that are too short do not account for natural delays that occur in transit over a network. A more sophisticated timeout strategy would lengthen the timeout value on successive retries and perhaps keep statistics about response times to use in setting future timeout values. Often, the timeout value is doubled for each successive timeout. The potential infinite loop for the wrong host might be handled by incorporating a counter for the wrong host condition and returning an error if this condition occurs more than a certain number of times.
Program 20.8 request_reply_timeout_retry.c
Request-reply implementation with timeout and retries.
#include <stdio.h>
#include <errno.h>
#include "uiciudp.h"
int request_reply_timeout_retry(int requestfd, void* request, int reqlen,
char* server, int serverport, void *reply, int replen,
double timeout, int maxretries) {
ssize_t nbytes;
int retries;
u_buf_t senderinfo;
retries = 0;
while (retries < maxretries) {
/* send process ID to (server, serverport) */
nbytes = u_sendtohost(requestfd, request, reqlen, server, serverport);
if (nbytes == -1)
return -1; /* error on send */
/* wait timeout seconds for a response, restart if from wrong server */
while (((nbytes = u_recvfromtimed(requestfd, reply, replen,
&senderinfo, timeout)) >= 0) &&
(u_comparehost(&senderinfo, server, serverport) == 0)) ;
if (nbytes >= 0)
break;
retries++;
}
if (retries >= maxretries) {
errno = ETIME;
return -1;
}
return (int)nbytes;
}
With the request-reply with timeouts and retries of Program 20.8, the server may execute the same client request multiple times, with multiple repeats being reflected in the logs produced by the server of Program 20.3. Sometimes reexecution of a request produces invalid results, for example, in banking when a client request to credit an account should not be performed multiple times. On the other hand, a client request for information from a static database can be repeated without ill effect. Operations that can be performed multiple times with the same effect are called idempotent operations. The next section introduces a strategy for handling nonidempotent operations.
|
20.6 Request-Reply-Acknowledge Protocols
The invocation semantics describe the behavior of a request protocol. The request_reply function of Program 20.5 implements maybe semantics. The request may or may not be executed. In the limit as the maximum number of allowed retries becomes large, Program 20.8 approximates at-least-once semantics. Unless the request represents an idempotent operation, at-least-once semantics may result in incorrect behavior if a particular request is executed more than once.
An alternative is at-most-once semantics, which can be implemented by having the server save the results of previous requests. If a duplicate request comes, the server retransmits the reply without reexecuting the request. To recognize that a request is a duplicate, the client and server must agree on a format for uniquely identifying requests. The server also must save all replies from all requests until it is sure that the respective clients have received the replies. In the request-reply-acknowledge protocol of Figure 20.9, the client sends an acknowledgment to the server after receiving a reply. The server can safely discard the reply after receiving the acknowledgment.
Exercise 20.15
Devise a format for a message containing a process ID that could be used in the request-reply-acknowledge protocol of Figure 20.9.
Answer:
One possibility is to use a structure containing the process ID and a sequence number. The client initializes the sequence number to 1 and increments it for each new request. This approach works as long as the sequence numbers and process IDs do not wrap around. Since we are sending the process ID as a string rather than in raw binary form, we can send the sequence number in the same way. The string sent consists of the sequence number followed by a blank followed by the process ID. The server parses this string to separate the two values. If data is sent in raw form rather than as a string, care must be taken to handle differences in byte ordering (big-endian vs. little-endian) between the client and server if the values are used for anything other than uniqueness.
The server side of the request-reply-acknowledge protocol is more complicated. The server must keep a copy of each reply until it receives the corresponding acknowledgment. If the client fails to send an acknowledgment, say, because of a crash, the server may keep the information forever. Connection-oriented communication is more suitable for this type of communication. TCP implements reliable communication by using a request-reply-acknowledge protocol, including negative acknowledgments and flow control, that is optimized for good performance.
|
20.7 Implementation of UICI UDP
UICI UDP functions use the same name resolution functions, addr2name and name2addr, as the UICI TCP functions. Program C.4 shows implementations of these functions. Compile your source with uiciname.c when using UICI UDP.
20.7.1 Implementation of u_openudp
The UICI UDP function u_openudp takes a port number as its parameter and creates a connectionless socket for communication using UDP. The u_openudp function returns a file descriptor if the communication endpoint was successfully created. Servers call u_openudp with their well-known port as a parameter. Clients generally call u_openudp with a parameter of 0, meaning that they will allow the system to choose an ephemeral port when it becomes necessary. The u_openudp function returns 1 and sets errno if an error occurs.
Program 20.9 implements u_openudp. The u_openudp function uses the socket function discussed on page 631 to create the communication endpoint. As in the case of TCP, the domain is AF_INET and the protocol is 0. The type is SOCK_DGRAM rather than SOCK_STREAM.
If the port number parameter is greater than 0, u_openudp associates the newly created socket with this port number by calling bind, a library function described on page 631.
Program 20.9 u_openudp.c
An implementation of u_openudp.
#include <errno.h>
#include <unistd.h>
#include <sys/socket.h>
#include "restart.h"
#include "uiciudp.h"
int u_openudp(u_port_t port) {
int error;
int one = 1;
struct sockaddr_in server;
int sock;
if ((sock = socket(AF_INET, SOCK_DGRAM, 0)) == -1)
return -1;
if (setsockopt(sock, SOL_SOCKET, SO_REUSEADDR, &one, sizeof(one)) == -1) {
error = errno;
r_close(sock);
errno = error;
return -1;
}
if (port > 0) {
server.sin_family = AF_INET;
server.sin_addr.s_addr = htonl(INADDR_ANY);
server.sin_port = htons((short)port);
if (bind(sock, (struct sockaddr *)&server, sizeof(server)) == -1) {
error = errno;
r_close(sock);
errno = error;
return -1;
}
}
return sock;
}
Comparing u_openudp with u_open on page 634, we see that bind is called only when the port number is greater than 0. Only a server needs to bind the socket to a particular port. Also, it is not necessary to worry about SIGPIPE. A write to a pipe (or a TCP socket) generates a SIGPIPE signal when there are no active readers. In contrast, UDP provides no information about active receivers. A UDP datagram is considered to be sent correctly when it is successfully copied into the buffers of the network subsystem. UDP does not detect an error when an application sends a datagram to a destination that is not waiting to receive it, so sending does not generate a SIGPIPE.
20.7.2 The sendto function
The POSIX sendto function transmits data as a single datagram and returns the number of transmitted bytes if successful. However, sendto checks only local errors, and success does not mean that the receiver actually got the data.
The first three parameters for sendto have the same meaning as for read and write. The socket parameter holds a file descriptor previously opened by a call to socket. The message parameter has the data to be sent, and length is the number of bytes to send. The flags parameter allows special options that we do not use, so this value is always zero. The dest_addr parameter points to a structure filled with information about the destination, including the remote host address and the remote port number. Since we are using the Internet domain, *dest_addr is a struct sockaddr_in structure. The dest_len is the size of the struct sockaddr_in structure.
SYNOPSIS
#include <sys/socket.h>
ssize_t sendto(int socket, const void *message, size_t length,
int flags, const struct sockaddr *dest_addr,
socklen_t dest_len);
POSIX
If successful, sendto returns the number of bytes sent. If unsuccessful, sendto returns 1 and sets errno. The following table lists the mandatory errors for sendto with unconnected sockets.
|
EAFNOSUPPORT | address family cannot be used with this socket | EAGAIN or EWOULDBLOCK | O_NONBLOCK is set and operation would block | EBADF | socket parameter is not a valid file descriptor | EINTR | sendto interrupted before any data was transmitted | EMSGSIZE | message too large to be sent all at once as required by socket | ENOTSOCK | socket does not refer to a socket | EOPNOTSUPP | specified flags not supported for this type of socket |
The sendto function can be used with sockets connected to a particular destination host and port. However, sendto still determines the destination host and port number by the information in the *dest_addr structure, independently of this connection.
If sendto is used on a socket that is not yet bound to a source port, the network subsystem assigns an unused ephemeral port to bind with the socket. Datagrams originating from this socket include the port number and the source host address along with the data so that the remote host can reply.
20.7.3 Implementation of u_sendto and u_sendtohost
The UICI UDP library provides two functions for sending messages, u_sendto and u_sendtohost, shown in Program 20.10. The u_sendtohost takes the destination host name and port number as parameters. It is meant to be used when initiating a communication with a remote host. The u_sendto function uses a u_buf_t structure that was filled by a previous call to u_recvfrom. The u_buf_t structure is meant to be used in a reply.
Program 20.10 u_sendto.c
An implementation of u_sendto and u_sendtohost.
#include <errno.h>
#include <sys/socket.h>
#include "uiciname.h"
#include "uiciudp.h"
ssize_t u_sendto(int fd, void *buf, size_t nbytes, u_buf_t *ubufp) {
int len;
struct sockaddr *remotep;
int retval;
len = sizeof(struct sockaddr_in);
remotep = (struct sockaddr *)ubufp;
while (((retval = sendto(fd, buf, nbytes, 0, remotep, len)) == -1) &&
(errno == EINTR)) ;
return retval;
}
ssize_t u_sendtohost(int fd, void *buf, size_t nbytes, char *hostn,
u_port_t port) {
struct sockaddr_in remote;
if (name2addr(hostn, &(remote.sin_addr.s_addr)) == -1) {
errno = EINVAL;
return -1;
}
remote.sin_port = htons((short)port);
remote.sin_family = AF_INET;
return u_sendto(fd, buf, nbytes, &remote);
}
The u_sendto function is almost identical to sendto except that u_sendto restarts if interrupted by a signal. The u_buf_t data type is defined in uiciudp.h by a typedef that sets it to be equivalent to struct sockaddr_in. This allows a u_buf_t pointer to be cast to a struct sockaddr pointer in the implementation of u_sendto. The user does not need to know anything about the internal representation of the u_buf_t structure, provided that its value was set by u_recvfrom or u_recvfromtimed.
The u_sendtohost function uses name2addr from uiciname.c to convert the host name to an address. If the host name begins with a digit, name2addr assumes that it is an IP address in dotted form and calls inet_addr to decode it. Otherwise, name2addr resolves the host name and fills struct sockaddr_in with the remote host address. The u_sendtohost function fills in the port number and address family and calls u_sendto. Since name2addr does not set errno when an error occurs, the u_sendtohost sets errno to EINVAL when name2addr returns an error.
20.7.4 The recvfrom function
The POSIX recvfrom function blocks until a datagram becomes available on file descriptor representing an open socket. While it is possible to use recvfrom with TCP sockets, we consider only UDP SOCK_DGRAM sockets. Be sure to associate socket with a port, either by explicitly calling bind or by calling sendto, which forces a binding to an ephemeral port. A call to recvfrom on a socket that has not been bound to a port may hang indefinitely.
The buffer parameter of recvfrom points to a user-provided buffer of length bytes that receives the datagram data. The amount of data received is limited by the length parameter. If the datagram is larger than length, recvfrom truncates the message to size length and drops the rest of the datagram. In either case, recvfrom returns the number of bytes of data placed in buffer.
The *address structure is a user-provided struct sockaddr structure that recvfrom fills in with the address of the sender. If address is NULL, recvfrom does not return sender information. The address_len parameter is a pointer to a value-result parameter. Set *address_len to the length of address before calling recvfrom. On return, recvfrom sets *address_len to the actual length of *address. The address_len parameter prevents buffer overflows because recvfrom truncates the sender information to fit in *address. It is not considered an error if the information put in *address is truncated, so be sure to make the buffer is large enough. For our purposes, the buffer should be able to hold a struct sockaddr_in structure.
SYNOPSIS
#include <sys/socket.h>
ssize_t recvfrom(int socket, void *restrict buffer, size_t length,
int flags, struct sockaddr *restrict address,
socklen_t *restrict address_len);
POSIX
If successful, recvfrom returns the number of bytes that were received. If unsuccessful, recvfrom returns 1 and sets errno. The following table lists the mandatory errors for recvfrom with an unconnected socket.
|
EAGAIN or EWOULDBLOCK | O_NONBLOCK is set and no data is waiting to be received, or MSG_OOB is set and no out-of-band data is available and either O_NONBLOCK is set or socket does not support blocking with out-of-band data | EBADF | socket is not a valid file descriptor | EINTR | recvfrom interrupted by a signal before any data was available | EINVAL | MSG_OOB is set and no out-of-band data is available | ENOTSOCK | socket does not refer to a socket | EOPNOTSUPP | specified flags not supported for this type of socket |
20.7.5 Implementation of u_recvfrom and u_recvfromtimed
Program 20.11 implements u_recvfrom. It is similar to recvfrom except that it restarts recvfrom if interrupted by a signal. The returned sender information is encapsulated in the u_buf_t parameter, which is used as an opaque object for a reply, using u_sendto, to the sender. If successful, u_recvfrom returns the number of bytes received. If unsuccessful, u_recvfrom returns 1 and sets errno. Since UDP datagrams of length 0 are valid, a return value of 0 indicates a datagram of length 0 and should not be interpreted as end-of-file.
Program 20.11 u_recvfrom.c
An implementation of u_recvfrom.
#include <errno.h>
#include <sys/socket.h>
#include "uiciudp.h"
ssize_t u_recvfrom(int fd, void *buf, size_t nbytes, u_buf_t *ubufp) {
int len;
struct sockaddr *remote;
int retval;
len = sizeof (struct sockaddr_in);
remote = (struct sockaddr *)ubufp;
while (((retval = recvfrom(fd, buf, nbytes, 0, remote, &len)) == -1) &&
(errno == EINTR)) ;
return retval;
}
Since UDP is not reliable, a datagram can be lost without generating an error for either the sender or the receiver. More reliable protocols based on UDP use some form of request-reply or request-reply-acknowledge protocol discussed in Sections 20.4 through 20.6. These protocols require that the receiver not block indefinitely waiting for messages or replies. The u_recvfromtimed function returns after a specified time if it does not receive a datagram. If successful, u_recvfromtimed returns the number of bytes written in *buf. If a timeout occurs, u_recvfromtimed returns 1 and sets errno to ETIME. For other errors, u_recvfromtimed returns 1 and sets errno to the same values as u_recvfrom does.
Strategies for implementing timeouts include socket options for timeout, signals or select. Unfortunately, the socket options supporting timeouts are not universally available. The signal strategy uses a timer to generate a signal after a specified time. When a signal is caught, recvfrom returns with the error EINTR. The use of signals may interfere with other timers that a program might be using.
Program 20.12 implements u_recvfromtimed with the waitfdtimed function from the restart library. The implementation of waitfdtimed using select is shown in Program 4.15 on page 114. The waitfdtimed function takes two parameters: a file descriptor and an ending time. The add2currenttime function from the restart library converts the timeout interval into an ending time. Using the ending time rather than directly using the time interval allows waitfdtimed to restart if interrupted by a signal and still retain the same ending time for the timeout.
Program 20.12 u_recvfromtimed.c
An implementation of u_recvfromtimed.
#include <errno.h>
#include <sys/socket.h>
#include <sys/time.h>
#include "restart.h"
#include "uiciudp.h"
ssize_t u_recvfromtimed(int fd, void *buf, size_t nbytes, u_buf_t *ubufp,
double seconds) {
int len;
struct sockaddr *remote;
int retval;
struct timeval timedone;
timedone = add2currenttime(seconds);
if (waitfdtimed(fd, timedone) == -1)
return (ssize_t)(-1);
len = sizeof (struct sockaddr_in);
remote = (struct sockaddr *)ubufp;
while (((retval = recvfrom(fd, buf, nbytes, 0, remote, &len)) == -1) &&
(errno == EINTR)) ;
return retval;
}
Exercise 20.16
Suppose you call u_recvfromtimed with a timeout of 2 seconds and 10 signals come in 1 second apart. When does u_recvfromtimed time out if no data arrives?
Answer:
It still times out 2 seconds after it is called. The reason is that waitfdtimed times out at a given ending time, independently of the number of times it needs to restart.
20.7.6 Host names and u_buf_t
The UICI UDP library also provides three functions for examining receiver host information. The u_gethostname function, which can be called after u_recvfrom or u_recvfromtimed, creates a string that corresponds to the name of a host. The first parameter of u_gethostname is a u_buf_t structure previously set, for example, by u_recvfrom. The u_gethostname function returns a null-terminated string containing the name of the host in the user-supplied buffer hostn. The third parameter of u_gethostname is the length of hostn. The u_gethostname function truncates the host name so that it fits.
The implementation of u_gethostname in Program 20.13 just calls addr2name and sets its *hostn buffer to the result. Recall that if addr2name cannot convert the address to a host name, it sets *hostn to the dotted-decimal representation of the host address. The addr2name function never returns an error.
Program 20.13 u_gethostname.c
An implementation of u_gethostname.
#include "uiciname.h"
#include "uiciudp.h"
void u_gethostname(u_buf_t *ubufp, char *hostn, int hostnsize) {
struct sockaddr_in *remotep;
remotep = (struct sockaddr_in *)ubufp;
addr2name(remotep->sin_addr, hostn, hostnsize);
}
The u_gethostinfo function is similar to u_gethostname but is meant primarily for debugging. The u_gethostinfo function fills in a printable string with both the host name and port number corresponding to a u_buf_t structure. Program 20.14 implements u_gethostinfo.
Program 20.14 u_gethostinfo.c
An implementation of u_gethostinfo.
#include <stdio.h>
#include "uiciudp.h"
#define BUFSIZE 1024
void u_gethostinfo(u_buf_t *ubufp, char *info, int infosize) {
int len;
int portnumber;
portnumber = ntohs(ubufp->sin_port);
len = snprintf(info, infosize, "port number is %d on host ", portnumber);
info[infosize-1] = 0; /* in case name did not fit */
if (len >= infosize) return;
u_gethostname(ubufp, info+len, infosize-len);
}
The function u_comparehost returns 1 if the given host name and port number match the information given in a u_buf_t structure, *ubufp, and 0 otherwise. The u_comparehost function first checks that the port numbers agree and returns 0 if they do not. Otherwise, u_comparehost calls name2addr to convert the host name to an address and compares the result to the address stored in ubufp. Program 20.15 implements u_comparehost.
Program 20.15 u_comparehost.c
An implementation of u_comparehost.
#include <string.h>
#include <sys/socket.h>
#include "uiciname.h"
#include "uiciudp.h"
int u_comparehost(u_buf_t *ubufp, char *hostn, u_port_t port) {
in_addr_t addr;
struct sockaddr_in *remotep;
remotep = (struct sockaddr_in *)ubufp;
if ((port != ntohs(remotep->sin_port)) ||
(name2addr(hostn, &addr) == -1) ||
(memcmp(&(remotep->sin_addr.s_addr), &addr, sizeof(in_addr_t)) != 0))
return 0;
return 1;
}
20.8 Comparison of UDP and TCP
Both UDP and TCP are standard protocols used by applications to send information over a network. The choice of which to use for a given application depends on the design goals of the application. This section summarizes the main differences between UDP and TCP from the viewpoint of the application.
TCP is connection-oriented and UDP is not. To send over a TCP communication endpoint, a client first makes a connection request and the server accepts it. Once the client and server have established the connection, they can enjoy symmetric bidirectional communication with standard read and write functions. The endpoints are associated with the client and server pair. Either side can close the connection, in which case the other side finds out about it when it tries to read or write. Thus, applications communicating with TCP can tell when the other side is done. In contrast, an application can use a UDP communication endpoint to send to or receive from anyone. Each message must include the destination address (usually an IP address and port number). UDP does not provide an application with knowledge about the status of the remote end. UDP is based on messages, and TCP is based on byte streams. If an application sends a UDP message with a single sendto, then (if the buffer is large enough) a call to recvfrom on the destination endpoint either retrieves the entire message or nothing at all. (Remember that we only consider unconnected UDP sockets.) In contrast, an application that sends a block of data with a single TCP write has no guarantee that the receiver retrieves the entire block in a single read. A single read retrieves a contiguous sequence of bytes in the stream. This sequence may contain all or part of the block or may extend over several blocks. TCP delivers streams of bytes in the same order in which they were sent. UDP can deliver messages out of order, even if no errors occur anywhere in the network. UDP delivers messages to the application in the order they are received. Since individual UDP packets may travel different routes on the Internet, they may not arrive in the order they were sent. In contrast, the network subsystem of the receiving host buffers TCP packets and uses sequence numbers to deliver bytes to the application in the order they were sent. TCP is reliable and UDP is unreliable. If TCP cannot deliver data to the remote host, it eventually reports the failure by returning an error. UDP is unreliable. The network might drop UDP packets and never deliver them to the remote host. UDP does not notify either the sender or the receiver that an error has occurred. The UDP sendto and the TCP write functions return after successfully copying their message into a buffer of the network subsystem. The point of return for UDP does not depend in any way on the status of the receiver. For TCP, the point of return depends indirectly on the status of the receiver and the network. The TCP network subsystem may hold outgoing data in its buffers because the receiving host has no available buffers, the receiver has not acknowledged packets, or the network is congested. The held data may cause subsequent TCP write calls to block. Although TCP has flow control, you should not interpret a return from a TCP write call as an indication that data has arrived at the destination host.
|
20.9 Multicast
The connectionless protocols that we have been discussing thus far are unicast or point-to-point, meaning that each message is sent to a single communication endpoint. In multicast, by contrast, a single send call can cause the delivery of a message to multiple communication endpoints.
Multicasting, which is usually implemented over an existing network structure, supports the abstraction of a group of processes that receive the same messages. Reliable multicast delivers messages exactly once to the members of the group. Ordered multicast delivers messages to each group member in the same order.
This section focuses on low-level IP multicasting available to applications through UDP sockets. Unlike unicast operations, several processes on the same host can receive messages on communication endpoints bound to the same multicast port.
IP multicast groups are identified by a particular IP address. A process joins a multicast group by binding a UDP socket (SOCK_DGRAM) to its multicast address and by setting appropriate socket options. The socket options inform the network interface that incoming messages for the indicated multicast address should be forwarded to the socket. If several processes on the same machine have joined a multicast group, the network interface duplicates each incoming message for all group members. The socket options also cause the host to inform LAN routers that processes on this host have joined the group. If a multicast message arrives at a LAN router, the router forwards the message on all LANs that have at least one host with a member process.
20.9.1 Multicast Addressing
This book discusses only IPv4 multicast. IPv4 multicast addresses are in the range 224.0.0.0 through 239.255.255.255. IPv4 hosts and routers are not required to support multicasting. Hosts that support multicasting must join the all-hosts group 224.0.0.1. Routers that support multicasting must join the all-routers group 224.0.0.2. The addresses used to specify multicast groups are divided into four groups according to the scope of the group. The multicast scope refers to how far from the source multicast messages should be distributed.
Link-local multicast addresses are in the range 224.0.0.0 through 224.0.0.255. Link-local addresses are only for machines connected at the lowest level of topology of the network. Multicast messages with these addresses are not forwarded by a multicast router.
Global multicast addresses are in the range 224.0.1.0 to 238.255.255.255. Global addresses should be forwarded by all multicast routers. Currently, multicast is not truly global because some routers do not support multicast and many router administrators have disabled global multicast for security reasons. Also, there is no political mechanism for reserving a global multicast address and port.
Addresses in the rest of the range, 239.0.0.0 to 239.255.255.255, are called administratively scoped multicast addresses. These addresses are meant to be used inside an organization. They should not be forwarded outside the administrative control of the organization, since they are not guaranteed to be unique.
Table 20.2 gives the prototypes of the two UICI UDP functions needed to support multicast communication. The u_join function creates a UDP socket and calls the socket options needed for the socket to join a particular multicast group. The u_leave function calls a socket option to leave the multicast group. After u_leave returns, the socket is still open and bound to the same port, but it can no longer receive multicast messages.
Table 20.2. Summary of UICI UDP multicast calls.|
int u_join(char *IP_address,
u_port_t port,
u_buf_t *mcast_info)
| creates UDP socket for multicast and binds socket to port returns the socket file descriptor |
int u_leave(int fd, u_buf_t *mcast_info)
| leaves multicast group |
The IP_address parameter of u_join holds a string representing the multicast address in dotted form. The port parameter is the multicast port number. The mcast_info parameter points to a user-supplied u_buf_t structure. If successful, u_join returns the file descriptor of the newly created socket and fills in the user-supplied u_buf_t structure with the multicast address for later use with u_sendto or u_leave. If successful, u_leave returns 0. If unsuccessful, u_join and u_leave return 1 and set errno.
The u_join function sets up a socket that can both send to and receive from the multicast group, but a socket does not have to belong to a multicast group to send to it. The simple UDP client in Program 20.2 can be used for sending. All that is necessary is for sendto to use a valid multicast destination address.
Program 20.16 shows a program that receives multicast messages. It takes two command-line arguments: the multicast IP address in dotted form and the multicast port number. The program first joins the multicast group with u_join and then echoes what it receives to standard output along with the name of the sending host.
Program 20.16 multicast_receiver.c
A multicast receiver that echoes what it receives to standard output.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include "restart.h"
#include "uiciudp.h"
#define BUFSIZE 1024
int main(int argc, char *argv[]) {
char buf[BUFSIZE];
ssize_t bytesread;
char hostinfo[BUFSIZE];
int mcastfd;
u_buf_t mcastinfo;
u_port_t mcastport;
u_buf_t senderinfo;
if (argc != 3) {
fprintf(stderr, "Usage: %s multicast-address multicast-port\n", argv[0]);
return 1;
}
mcastport = (u_port_t)atoi(argv[2]); /* join the multicast group */
if ((mcastfd = u_join(argv[1], mcastport, &mcastinfo)) == -1) {
perror("Failed to join multicast group");
return 1;
}
u_gethostinfo(&mcastinfo, buf, BUFSIZE);
fprintf(stderr, "Info: %s\n", buf);
fprintf(stderr, "mcastfd is %d\n", mcastfd);
/* read information from multicast, send to standard output */
while ((bytesread = u_recvfrom(mcastfd, buf, BUFSIZE, &senderinfo)) > 0) {
u_gethostinfo(&senderinfo, hostinfo, BUFSIZE);
if ((r_write(STDOUT_FILENO, hostinfo, strlen(hostinfo)) == -1) ||
(r_write(STDOUT_FILENO, buf, bytesread) == -1)) {
perror("Failed to echo message received to standard output");
break;
}
}
return 0;
}
20.9.2 Implementation of u_join
Program 20.17 implements the u_join function. The application first creates a UDP socket. Next, the application joins the multicast group by using setsockopt with level IPPROTO_IP, option name IP_ADD_MEMBERSHIP, and an option value specifying the multicast address. These options instruct the link layer of the host's network subsystem to forward multicast packets from that address to the application. The application can then use u_sendto and u_recvfrom (and the underlying sendto and recvfrom) as before.
Program 20.17 u_join.c
An implementation of u_join.
#include <arpa/inet.h>
#include <sys/socket.h>
#include "uiciudp.h"
int u_join(char *IP_address, u_port_t port, u_buf_t *ubufp) {
int mcastfd;
struct ip_mreq tempaddress;
if ((mcastfd = u_openudp(port)) == -1)
return mcastfd;
tempaddress.imr_multiaddr.s_addr = inet_addr(IP_address);
tempaddress.imr_interface.s_addr = htonl(INADDR_ANY);
/* join the multicast group; let kernel choose the interface */
if (setsockopt(mcastfd, IPPROTO_IP, IP_ADD_MEMBERSHIP,
&tempaddress, sizeof(tempaddress)) == -1)
return -1;
ubufp->sin_family = AF_INET;
ubufp->sin_addr.s_addr = inet_addr(IP_address);
ubufp->sin_port = htons((short)port);
return mcastfd;
}
20.9.3 Implementation of u_leave
Program 20.18 implements the u_leave function. The u_leave function informs the network subsystem that the application is no longer participating in the multicast group by calling by setsockopt with the IP_DROP_MEMBERSHIP option. Since u_leave does not close it, the mcast socket can still send multicast messages and receive non-multicast messages.
Program 20.18 u_leave.c
An implementation of u_leave.
#include <string.h>
#include <sys/socket.h>
#include "uiciudp.h"
int u_leave(int mcastfd, u_buf_t *ubufp) {
struct ip_mreq tempaddress;
memcpy(&(tempaddress.imr_multiaddr),
&(ubufp->sin_addr), sizeof(struct in_addr));
tempaddress.imr_interface.s_addr = htonl(INADDR_ANY);
return setsockopt(mcastfd, IPPROTO_IP, IP_DROP_MEMBERSHIP,
&tempaddress, sizeof(tempaddress));
}
20.10 Exercise: UDP Port Server
This exercise describes a server that uses UDP to provide information about the services that are available on the host on which it is running. Start by reading the man page for getservbyname if this function is available on your system. Also, get a copy of the netdb.h header file. If your system does not support getservbyname, your server should use a table of your own construction.
Design a "service server" that allows clients to find out which services are available on a host. The client sends a UDP request containing the following.
Sequence number (an integer in network byte order) Protocol name (a null-terminated string) Name of the service (a null-terminated string)
The server returns a response containing the following information.
Same sequence number as in the request Integer port number (in network byte order) Set of null-terminated strings giving aliases of the service
If the host does not support the service, the server should return 1 for the port number. For simplicity, use the following structure for both the request and the response.
#define NAMESIZE 256
struct service {
int sequence;
int port;
char names[NAMESIZE];
} hostsev;
Write a UDP test client that prompts the user for host information, protocol and service name. The client chooses a sequence number at random, marshals the request (puts it in the form of the preceding structure), and sends it to the server.
The UDP client should take three command-line arguments: the name of the host running the service server, the UDP port number for this service and the timeout value by the client. The client either waits until it receives a response from the server or times out before prompting the user for another request. If the sequence number of a received response does not match the sequence number of the most recent request, the client should print the response, noting the mismatch, and resume waiting for the server to respond. As part of your testing, set a very short timeout in the client and insert a delay in the server between the receipt of the request and the response. The delay will cause a previous packet to be received on the next request. During testing, run several servers on different machines and have multiple clients accessing different servers in turn.
20.11 Exercise: Stateless File Server
This exercise describes the implementation of a simple stateless file server based on UDP. A stateless server is one for which client requests are completely self-contained and leave no residual state information on the server. Not all problems can be cast in stateless form, but there are some well-known examples of stateless servers. Sun NFS (Network File System) is implemented as a stateless client-server system based on unreliable remote procedure calls (RPCs).
Program 20.19 shows a putblock function that writes a block of data to a specified file. Although the normal write function assumes an open file descriptor and manipulates a file pointer, the putblock function is stateless. The stateless form of file access does not assume that a file descriptor has been previously opened and does not leave file descriptors open after servicing the request.
Exercise 20.17
An idempotent operation is an operation that can be performed multiple times with the same effect. Is the putblock operation of Program 20.19 idempotent?
Answer:
Although the contents of the file will not change if putblock is called multiple times with the same parameters, putblock is not strictly idempotent because the modification date changes.
Exercise 20.18
Write a getblock function that is similar to putblock. Is getblock idempotent?
Answer:
POSIX specifies that the struct stat structure have a time_t st_atime field giving the time that a file was last accessed. Thus, getblock is not strictly idempotent.
Program 20.19 putblock.c
Implementation of a stateless write to a file.
#include <errno.h>
#include <fcntl.h>
#include <unistd.h>
#include <sys/stat.h>
#include "restart.h"
#define BLKSIZE 8192
#define PUTBLOCK_PERMS (S_IRUSR | S_IWUSR)
int putblock(char *fname, int blknum, char *data) {
int error = 0;
int file;
if ((file = open(fname, O_WRONLY|O_CREAT, PUTBLOCK_PERMS)) == -1)
return -1;
if (lseek(file, blknum*BLKSIZE, SEEK_SET) == -1)
error = errno;
else if (r_write(file, data, BLKSIZE) == -1)
error = errno;
if ((r_close(file) == -1) && !error)
error = errno;
if (!error)
return 0;
errno = error;
return -1;
}
20.11.1 Remote File Services
A simple remote file service can be built from the getblock of Exercise 20.18 and putblock of Program 20.19. A server running on the machine containing the file system listens for client requests. Clients can send a request to read or write a block from a file. The server executes getblock or putblock on their behalf and returns the results. The client software translates user requests for reading and writing a file into requests to read and write specific blocks and makes the requests to the server.
This is a simplification of the strategy pursued by remote file services such as NFS. Real systems have caching at both endsthe client and the server keep blocks for files that have been accessed recently in memory, to give better performance. File servers often bypass the file system table and use low-level device operations to read from and write to the disk. Of course, both sides must worry about authorization and credentials for making such requests.
A typical file service might provide the following services.
Read a particular block from a specified remote file.
Write a particular block to a specified remote file.
Create or delete a new remote file.
Create or delete a special remote file such as a directory.
Get the struct stat equivalent for a specified remote file.
Access or modify the permissions for a specified file.
Based on the file services that you might want to implement, devise a format for the client request and the server response. Discuss your strategy for handling errors and for dealing with network byte order.
Implement and test the portion of the remote file service for getting and putting single file blocks, using UDP with a request-reply-acknowledge for the client side. Discuss how you would implement client-side libraries that would allow reading and writing a stream of bytes based on these single-block functions.
20.12 Additional Reading
UNIX Network Programming Networking APIs: Sockets and XTI by Stevens [115] has an in-depth discussion of programming with UDP. TCP/IP Illustrated:The Protocols, Volume 1 by Stevens explains the inner workings of the UDP protocol.
|
|
|
|
|
|