Search     or:     and:
 LINUX 
 Language 
 Kernel 
 Package 
 Book 
 Test 
 OS 
 Forum 
iakovlev.org

LinuxThreads Programming

By Matteo Dell'Omodarme


Some theory...

Introduction

LinuxThreads is a Linux library for multi-threaded programming. LinuxThreads provides kernel-level threads: threads are created with the clone() system call and all scheduling is done in the kernel. It implements the Posix 1003.1c API (Application Programming Interface) for threads and runs on any Linux system with kernel 2.0.0 or more recent, and a suitable C library.

What are threads?

A thread is a sequential flow of control through a program. Thus multi-threaded programming is a form of parallel programming where several threads of control are executing concurrently in the program.

Multi-threaded programming differs from Unix-style multi-processing in that all threads share the same memory space (and a few other system resources, such as file descriptors), instead of running in their own memory space as is the case with Unix processes. So a context switch between two threads in a single process is considerably cheaper than a context switch between two processes

There are two main reasons to use threads:

  1. Some programs reach their best performance only expressed as several threads that communicate together (i.e. servers), rather than a single flow of instructions.

  2. On a multiprocessor system, they can run in parallel on several processors, allowing a single program to divide its work between different processor. Such programs run faster than a single-thread program which can exploits only a CPU at a time.

Atomicity and volatility

Accessing the memory shared from threads require some care, because your parallel program can't access shared memory objects as they were in ordinary local memory.

Atomicity refers to the concept that an operation on an object is accomplished as an indivisible, uninterruptible, sequence. Operations on data in shared memory can occur not atomically, and, in addition of that, GCC compiler will often performs some optimizations, buffering values of shared variables in registers, avoiding the memory operations needed to ensure that all processors can see the changes on shared data.

To prevent GCC's optimizer from buffering values of shared memory objects in registers, all objects in shared memory should be declared as having types with the volatile attribute, since volatile objects reads and writes that require just one word access will occur atomically.

Locks

The load and store of the result are separate memory transactions: ++i doesn't always work to add one to a variable i in shared memory because other processors could access i between these two transactions. So, having two processes both perform ++i might only increment i by one, rather than by two.

So you need a system call that prevents a thread to work on a variable while another one is changing its value. This mechanism is implemented by the lock scheme, explained just below.
Suppose that you have two threads running a routine which change the value of a shared variable. To obtain the correct result the routine must:

  • assert a lock on variable i
  • modify the value of the locked variable
  • remove the lock
When a lock is asserted on a variable only the thread which locked the variable can change its value. Even more the flux of the other thread is blocked on the lock assertion, since only one lock at a time is allowed for a variable. Only when the first thread remove the lock the second one can restart asserting its own lock.

Consequently using shared variables may delay memory activity from other processors, whereas ordinary references may use local cache.

... and some practice

The header pthread.h

The facilities provided by LinuxThreads are available trough the header /usr/include/pthread.h which declare the prototypes of the thread routines.

Writing a multi-thread program is basically a 2 step process:

  • use the pthread routines to assert locks on shared variables and generate the threads.
  • create a structure for all the parameters you must pass to the thread subroutine
Let's analyze the two steps starting from a brief description of some basic pthread.h routines.

Initialize locks

One of the first actions you must accomplish is initialize all the locks. POSIX locks are declared as variables of type pthread_mutex_t; to initialize each lock you will need, call the routine:

int pthread_mutex_init(pthread_mutex_t  *mutex,   
                        const pthread_mutexattr_t *mutexattr);
 
as in the costruction:
#include <pthread.h>
 ...
  pthread_mutex_t lock;
  pthread_mutex_init(&lock,NULL);
 ...
 
The function pthread_mutex_init initializes the mutex object pointed to by mutex according to the mutex attributes specified in mutexattr. If mutexattr is NULL, default attributes are used instead.

In the continuation is shown how to use this initialized locks.

Spawning threads

POSIX requires the user to declare a variable of type pthread_t to identify each thread.
A thread is generated by the call to:

 
 int pthread_create(pthread_t *thread, pthread_attr_t *attr, 
                    void *(*start_routine)(void *), void *arg);
 
On success, the identifier of the newly created thread is stored in the location pointed by the thread argument, and a 0 is returned. On error, a non-zero error code is returned.

To create a thread running the routine f() and pass to f() a pointer to the variable arg use:

#include <pthread.h>
 ...
  pthread_t thread;
  pthread_create(&thread, NULL, f, &arg).
 ...
 
The routine f() must have the prototype:
void *f(void *arg);
 
Clean termination

As the last step you need to wait for the termination of all the threads spawned before accessing the result of the routine f(). The call to:

  
 int pthread_join(pthread_t th, void **thread_return);
 
suspends the execution of the calling thread until the thread identified by th terminates.
If thread_return is not NULL, the return value of th is stored in the location pointed to by thread_return.

Passing data to a thread routine

There are two ways to pass informations from a caller routine to a thread routine:

  • Global variables
  • Structures
The second one is the best choice in order to preserve the modularity of the code.
The structure must contain three levels of information; first of all informations about the shared variables and locks, second informations about all data needed by the routine; third an identification index distinguishing among threads and the number of CPU the program can exploit (making easy to provide this information at run time).

Let's inspect the first level of that structure; the information passed must be shared among every threads, so you must use pointers to the needed variables and locks. To pass a shared variable var of the type double, and its lock, the structure must contain two members:

  double volatile *var;
   pthread_mutex_t *var_lock;
 
Note the use of the volatile attribute, specifying that not pointer itself but var is volatile.

Example of parallel code

An example of program which can be easily parallelized using threads is the computation of the scalar product of two vectors.
The code is shown below with comments inserted.

/* use gcc  -D_REENTRANT -lpthread to compile */
 
 #include<stdio.h>
 #include<pthread.h>
 
 /* definition of a suitable structure */ 
 typedef struct
 {
   double volatile *p_s;       /* the shared value of scalar product */
   pthread_mutex_t *p_s_lock;  /* the lock for variable s */
   int n;                      /* the number of the thread */
   int nproc;                  /* the number of processors to exploit */
   double *x;                  /* data for first vector */
   double *y;                  /* data for second vector */
   int l;                      /* length of vectors */
 } DATA;
 
 void *SMP_scalprod(void *arg)
 {
   register double localsum;   
   long i;
   DATA D = *(DATA *)arg;
 
   localsum = 0.0;
   
 /* Each thread start calculating the scalar product from i = D.n 
    with D.n = 1, 2, ... , D.nproc.
    Since there are exactly D.nproc threads the increment on i is just
    D.nproc */ 
   
   for(i=D.n;i<D.l;i+=D.nproc)
      localsum += D.x[i]*D.y[i];
   
 /* the thread assert the lock on s ... */
   pthread_mutex_lock(D.p_s_lock);
 
 /* ... change the value of s ... */
   *(D.p_s) += localsum;
 
 /* ... and remove the lock */
   pthread_mutex_unlock(D.p_s_lock);
 
   return NULL;
 }
 
 #define L 9    /* dimension of vectors */
 
 int main(int argc, char **argv)
 {
   pthread_t *thread;    
   void *retval;
   int cpu, i;
   DATA *A;
   volatile double s=0;     /* the shared variable */ 
   pthread_mutex_t s_lock; 
   double x[L], y[L];
   
   if(argc != 2)
     {  
       printf("usage: %s  <number of CPU>\n", argv[0]);
       exit(1);
     }
 	
   cpu = atoi(argv[1]);
   thread = (pthread_t *) calloc(cpu, sizeof(pthread_t));
   A = (DATA *)calloc(cpu, sizeof(DATA));
 
  
   for(i=0;i<L;i++)
     x[i] = y[i] = i;
 
 /* initialize the lock variable */
   pthread_mutex_init(&s_lock, NULL);
   
   for(i=0;i&lt;cpu;i++)
     {
 /* initialize the structure */
       A[i].n = i;            /* the number of the thread */
       A[i].x = x;
       A[i].y = y;
       A[i].l = L;
       A[i].nproc = cpu;      /* the number of CPU */
       A[i].p_s = &s;
       A[i].p_s_lock = &s_lock;
 
       if(pthread_create(&thread[i], NULL, SMP_scalprod, &A[i] ))
 	{
 	  fprintf(stderr, "%s: cannot make thread\n", argv[0]);
 	  exit(1);
 	}
     }
 
   for(i=0;i<cpu;i++)
     {
       if(pthread_join(thread[i], &retval))
 	{
 	  fprintf(stderr, "%s: cannot join thread\n", argv[0]);
 	  exit(1);
 	}
     }
 
   printf("s = %f\n", s);
   exit(0);
 }
 

Thread-Specific Data and Signal Handling in Multi-Threaded Applications

Here are the answers to questions about signal handling and taking care of global data when writing multi-threaded programs.

Perhaps the two most common questions I'm asked about multi-threaded programming (after ``what is multi-threaded programming?'' and ``why would you want to do it?'') concern how to handle signals, and how to handle cases where two concurrent threads use a common function that makes use of global data, and yet the two threads need thread-specific data from that function. By definition, global data includes static local variables which are in truth a kind of global variable. In this article I'll explain how these questions can be dealt with in C programs using one of the POSIX (or almost POSIX) multi-threading packages available for Linux. I live in hope of the day when the most common question I'm asked about multi-threaded programming is, ``Can we give you lots of money to write this simple multi-threaded application, please?'' Hey--I can dream, can't I?

All the examples in this article make use of POSIX compliant functionality. To the best of my knowledge at the time I write this, there are no fully POSIX-compliant multi-threading libraries available for Linux. Which of the available libraries is best is something of a subjective issue. I use Xavier Leroy's LinuxThreads package, and the code fragments and examples were tested using version 0.5 of this library. This package can be obtained from http://pauillac.inria.fr/~xleroy/linuxthreads. Christopher Provenzano has a good user-level library, although the signal handling doesn't yet match the spec, and there were still a number of serious bugs the last time I used it. (These bugs, I believe, are being worked on.) Other library implementations are also available. Information on these and other packages can be found in the comp.programming.threads newsgroup and (to give a less than exhaustive list):

  • http://www.mit.edu:8001/people/proven/pthreads.html

  • http://www.aa.net/~mtp/PCthreads.html

  • ftp://ftp.cs.fsu.edu/pub/PART/PTHREADS

Thread-specific data

As I implied above, I use the term ``global data'' for any data which persists beyond normal scoping rules, such as static local variables. Given a piece of code like:

void foo(void)
 {
         static int i = 1;
         printf( "%d\n", i );
         i = 2;
 }
 
 

the first call to this function will print the value 1, and all subsequent calls will print the value 2, because the variable i and its value persist from one invocation of the function to the next, rather than disappearing in a puff of smoke as a ``normal'' local variable would. This, at least as far as POSIX threads are concerned, is global data.

It is commonly said (I've said it myself) that using global data is a bad practice. Whether or not this is true, it is only a rule of thumb. Certainly there are situations where using global data can avoid creating artificial circumstances. The previous article (Linux Journal Issue 34) explained how threads can share global data with careful use of mutual exclusion (mutex) functions to prevent one thread from accessing an item of global data while another thread is changing its value. In this article I will look at a different type of problem, using a real example from a recent project of mine.

Consider the case of a virtual reality system where a client makes several network socket connections to a server. Different types and priorities of data go down different sockets. High priority data, such as information about objects immediately in the field of view of the client, is sent down one socket. Lower priority data such as texture information, background sounds, or information about objects which are out of the current field of view, is sent down another socket to be processed whenever the client has available time. The server could create a collection of new threads every time a new client connects to the server, designating one thread for each of the sockets to be used to talk to each of the clients. Every one of these threads could use the same function to send a lump of data (not a technical term) to the client. The data to be sent details of the client it is to be sent to, the priority and type of data to be sent could all be held in global variables, and yet each thread will make use of different values. So how do we do it?

As a trivial example, suppose the only global data which our lump-sending function needs to use is an integer that indicates the priority of the data. In a non-threaded version, we might have a global integer called priority used as in Listing 1.

In the multi-threaded version we don't have a global integer, instead we have a global key to the integer. It is through the key that the data can be accessed by means of a number of functions:

  1. pthread_key_create() to prepare the key for use

  2. pthread_setspecific() to set a value to thread-specific data

  3. pthread_getspecific() to retrieve the current value

pthread_key_create() is called once, generally before any of the threads which are going to use the key have been created. pthread_getspecific() and pthread_setspecific() never return an error if the key that is used as an argument has not been created. The result of using them on a key which has not been created is undefined. Something will happen, but it could vary from system to system, and it can't be caught simply by using good error handling. This is an excellent source of bugs for the unwary. So our multi-threaded version might look like Listing 2.

There are a few things to note here:

  1. The implementation of POSIX threads can limit the number of keys a process may use. The standard states that this number must be at least 128. The number available in any implementation can be found by looking at the macro PTHREAD_KEYS_MAX. According to this macro, LinuxThreads currently allows 128 keys.

  2. The function pthread_key_delete() can be used to dispose of keys that are no longer needed. Keys, like all ``normal'' data items, vanish when the process exits, so why bother deleting them? Think of key handling as being similar to file handling. An unsophisticated program need not close any files that it has opened, as they will be automatically closed when the program exits. But since there is a limit to the number of files a program can have open at one time, the best policy is to close files not currently being used so that the limit is not exceeded. This policy also works well for key handling, as you may be limited in the number of thread-specific data keys a process may have.

  3. pthread_getspecific() and pthread_setspecific() access thread-specific data as void* pointers. This ability can be used directly (as in Listing 2), if the data item to be accessed can be cast as type void*, e.g., an int in most, but not necessarily all, implementations. However, if you want your code to be portable or if you need to access larger data objects, then each thread must allocate sufficient memory for the data object, and store the pointer to the object in the thread-specific data rather than storing the data itself.

  4. If you allocate some memory (using the standard function malloc(), for instance) for your thread-specific data, and the thread exits at some point, what happens to the allocated memory? Nothing happens, so it leaks, and this is bad. This is the situation where the extra parameter in the pthread_key_create() function comes into use. This parameter allows you to specify a function to call when a thread exits, and you use that function to free up any memory that has been allocated. To prevent a waste of CPU time, this destructor function is called only in the case where a thread has made use of that particular key. There's little point in tidying up for a thread that has nothing to be tidied. When a thread exits because it called one of the functions exit(), _exit() or abort(), the destructor function is not called. Also, note that pthread_key_delete() does not cause any destructors to be called, that using a key that has been deleted doesn't have a defined behavior, and that pthread_getspecific() and pthread_setspecific() don't return any error indications. Tidy up your keys carefully. One day you'll be glad you did. So a better version of our code is Listing 3.

Some of this code might look a little strange at first sight. Using pthread_getspecific() to store a thread specific value? The idea is to get the memory location this thread is to use, and then the thread specific value is stored there.

Even if global data is anathema to you, you might still have good use for thread-specific data. In particular, you might need to write a multi-threaded version of some existing library code that is also going to be used in a non-threaded program. A good simple example is making a version of the standard C libraries fit for use by multi-threaded programs. That friend of all C programmers, errno, is a global variable that is commonly set by library functions to indicate what went wrong during a function call. If two threads call functions which both set errno to different values, at least one of the threads is going to get the wrong information. This is solved by having thread-specific data areas for errno, rather than one global variable used by all threads.

Signal Handling

Many people find signal handling in C to be a bit tricky at the best of times. Multi-threaded applications need a little extra care when it comes to signal handling, but once you've written two programs, you'll wonder what all the fuss was about--trust me. And if you start to panic, remember--deep, slow breaths.

A quick re-cap of what signals are. Signals are the system's way of informing a process about various events. There are two types of signals, synchronous and asynchronous.

Synchronous signals are a result of a program action. Two examples are:

  1. SIGFPE, floating-point exception, is returned when the program tries to do some illegal mathematical operation such as dividing by zero.

  2. SIGSEGV, segmentation violation, is returned when the program tries to access an area of memory outside the area it can legally access.

Asynchronous signals are independent of the program. For example, the signal sent when the user gives the kill command.

In non-threaded applications there are three usual ways of handling signals:

  1. Pretend they don't exist, perhaps the most common policy, and quite adequate for lots of simple programs--at least until you want your program to be reliable and useful.

  2. Use signal() to set up a signal handler--nice and simple, but not very robust.

  3. Use the POSIX signal handling functions such as sigaction() and sigprocmask() to set up a signal handler or to ignore certain signals--the ``proper'' method.

If you choose the first option, then signals will have some default behavior. Typically, this default behavior will cause the program to exit or cause the program to ignore the signal, depending on what the signal is. The latter two options allow you to change the default behavior for each signal type--ignore the signal, cause the program to exit or invoke a signal-handling function to allow your program to perform some special processing. Avoid the use of the old-style signal() function. Whether you're writing threaded or non-threaded applications, the extra complications of the POSIX-style functions are worth the effort. Note that the behavior of sigprocmask(), which sets a signal mask for a process, is undefined in a multi-threaded program. There is a new function, pthread_sigmask(), that is used in much the same way as sigprocmask(), but it sets the signal mask only for the current thread. Also, a new thread inherits the signal mask of the thread that created it; so a signal mask can effectively be set for an entire process by calling pthread_sigmask() before any threads are created.

In a multi-threaded application, there is always the question of which thread the signal will actually be delivered to. Or does it get delivered to all the threads?

To answer the last question first, no. If one signal is generated, one signal is delivered, so any single signal will only be delivered to a single thread.

So which thread will get the signal? If it is a synchronous signal, the signal is delivered to the thread that generated it. Synchronous signals are commonly managed by having an appropriate signal handler set up in each thread to handle any that aren't masked. If it is an asynchronous signal, it could go to any of the threads that haven't masked out that signal using sigprocmask(). This makes life even more complicated. For instance, suppose your signal handler must access a global variable. This is normally handled quite happily by using mutex, as follows:

void signal_handler( int sig )
 {
         ...
         pthread_mutex_lock( &mutex1 );
         ...
         pthread_mutex_unlock( &mutex1 );
         ...
 }
 
 

Looks fine at first sight. However, what if the thread that was interrupted by the signal had just itself locked mutex1? The signal_handler() function will block, and will wait for the mutex to be unlocked. And the thread that is currently holding the mutex will not restart, and so will not be able to release the mutex until the signal handler exits. A nice deadly embrace.

So a common way of handling asynchronous signals in a multi-threaded program is to mask signals in all the threads, and then create a separate thread (or threads) whose sole purpose is to catch signals and handle them. The signal-handler thread catches signals by calling the function sigwait() with details of the signals it wishes to wait for. To give a simple example of how this might be done, take a look at Listing 4.

As mentioned earlier, a thread inherits its signal mask from the thread which creates it. The main() function sets the signal mask to block all signals, so all threads created after this point will have all signals blocked, including the signal-handling thread. Strange as it may seem at first sight, this is exactly what we want. The signal-handling thread expects signal information to be provided by the sigwait() function, not directly by the operating system. sigwait() will unmask the set of signals that are given to it, and then will block until one of those signals occurs.

Also, you might think that this program will deadlock, if a signal is raised while the main thread holds the mutex sig_mutex. After all, the signal handler tries to grab that same mutex, and it will block until that mutex comes free. However, the main thread is ignoring signals, so there is nothing to prevent another thread from gaining control while the signal handling thread is blocked. In this case, sig_handler() hasn't caught a signal in the usual, non-threaded sense. Instead it has asked the operating system to tell it when a signal has been raised. The operating system has performed this function, and so the signal handling thread becomes just another running thread.

Differences in Signal Handling between POSIX Threads and LinuxThreads

Listing 4 shows how to deal with signals in a multi-threading environment that handles threads in a POSIX compliant way.

Personally, I like the kernel-level package ``LinuxThreads'' that makes use of Linux 2.0's clone() system call to create new threads. At some point in the future, the clone() call may implement the CLONE_PID flag which would allow all the threads to share a process ID. Until then each thread created using ``LinuxThreads'' (or any other packages which chooses to use clone() to create threads) will have its own unique process ID. As such, there is no concept of sending a signal to ``the process.'' If one thread calls sigwait() and all other threads block signals, only those signals which are specifically sent to the sigwait()-ing thread will be processed. Depending on your application, this could mean that you have no choice other than to include an asynchronous signal handler in each of the threads.

Оставьте свой комментарий !

Ваше имя:
Комментарий:
Оба поля являются обязательными

 Автор  Комментарий к данной статье