Introduction To Unix Signals Programming
Table Of Contents
- What Are Signals?
- What Are Signals Used For?
- Sending Signals To Processes
- Catching Signals - Signal Handlers
- Installing Signal Handlers
- Avoiding Signal Races - Masking Signals
- Implementing Timers Using Signals
- Summary - "Do" and "Don't" inside A Signal Handler
What Are Signals?
Signals, to be short, are various notifications sent to a process
in order to notify it of various "important" events. By their nature,
they interrupt whatever the process is doing at this minute, and force
it to handle them immediately. Each signal has an integer number that
represents it (1, 2 and so on), as well as a symbolic name that is
usually defined in the file /usr/include/signal.h or one of the
files included by it directly or indirectly (HUP ,
INT and so on. Use the command 'kill -l' to see
a list of signals supported by your system).
Each signal may have a signal handler, which is a function that gets
called when the process receives that signal. The function is called
in "asynchronous mode", meaning that no where in your program you have
code that calls this function directly. Instead, when the signal is
sent to the process, the operating system stops the execution of the
process, and "forces" it to call the signal handler function. When that
signal handler function returns, the process continues execution from
wherever it happened to be before the signal was received, as if
this interruption never occurred.
Note for "hardwarists": If you are familiar with interrupts (you are,
right?), signals are very similar in their behavior. The difference
is that while interrupts are sent to the operating system by the hardware,
signals are sent to the process by the operating system, or by other
processes. Note that signals have nothing to do with software interrupts,
which are still sent by the hardware (the CPU itself, in this case).
What Are Signals Used For?
Signals are usually used by the operating system to notify processes that some
event occurred, without these processes needing to poll for the event. Signals
should then be handled, rather then used to create an
event notification mechanism for a specific application.
When we say that "Signals are being handled", we mean that our program is
ready to handle such signals that the operating system might be sending it
(such as signals notifying that the user asked to terminate it, or that
a network connection we tried writing into, was closed, etc). Failing to
properly handle various signals, would likely cause our application to
terminate, when it receives such signals.
Sending Signals To Processes
Sending Signals Using The Keyboard
The most common way of sending signals to processes is using the keyboard.
There are certain key presses that are interpreted by the system as requests
to send signals to the process with which we are interacting:
- Ctrl-C
-
Pressing this key causes the system to send an
INT signal
(SIGINT ) to the
running process. By default, this signal causes the process to immediately
terminate.
- Ctrl-Z
-
Pressing this key causes the system to send a
TSTP signal
(SIGTSTP ) to the
running process. By default, this signal causes the process to suspend
execution.
- Ctrl-\
-
Pressing this key causes the system to send a
ABRT signal
(SIGABRT ) to the
running process. By default, this signal causes the process to immediately
terminate. Note that this redundancy (i.e. Ctrl-\ doing the same as Ctrl-C)
gives us some better flexibility. We'll explain that later on.
Sending Signals From The Command Line
Another way of sending signals to processes is done using various
commands, usually internal to the shell:
kill
-
The kill command accepts two parameters: a signal name (or number), and a
process ID. Usually the syntax for using it goes something like:
kill -<signal> <PID>
For example, in order to send the INT signal to process with
PID 5342, type:
kill -INT 5342
This has the same affect as pressing Ctrl-C in the shell that runs
that process.
If no signal name or number is specified, the default is to send a
TERM signal to the process, which normally causes its
termination, and hence the name of the kill command.
fg
-
On most shells, using the
'fg' command will resume execution
of the process (that was suspended with Ctrl-Z), by sending it a
CONT signal.
Sending Signals Using System Calls
A third way of sending signals to processes is by using the kill system
call. This is the normal way of sending a signal from one process to another.
This system call is also used by the 'kill' command or by the
'fg' command.
Here is an example code that causes a process to suspend its own execution
by sending itself the STOP signal:
#include <unistd.h> /* standard unix functions, like getpid() */
#include <sys/types.h> /* various type definitions, like pid_t */
#include <signal.h> /* signal name macros, and the kill() prototype */
/* first, find my own process ID */
pid_t my_pid = getpid();
/* now that i got my PID, send myself the STOP signal. */
kill(my_pid, SIGSTOP);
An example of a situation when this code might prove useful, is inside a
signal handler that catches the TSTP signal (Ctrl-Z, remember?) in order
to do various tasks before actually suspending the process. We will see
an example of this later on.
Catching Signals - Signal Handlers
Catchable And Non-Catchable Signals
Most signals may be caught by the process, but there are a few signals
that the process cannot catch, and cause the process to terminate. For example,
the KILL signal (-9 on all unices I've met so far)
is such a signal. This is why you usually see a process being shut down using
this signal if it gets "wild". One process that uses this signal is a system
shutdown process. It first sends a TERM signal to all processes,
waits a while, and after allowing them a "grace period" to shut down cleanly,
it kills whichever are left using the KILL signal.
STOP is also a signal that a process cannot catch, and forces
the process's suspension immediately. This is useful when debugging programs
whose behavior depends on timing. Suppose that process A needs to send
some data to process B, and you want to check some system parameters after
the message is sent, but before it is received and processed by process B.
One way to do that would be to send a STOP signal to process B,
thus causing its suspension, and then running process A and waiting until
it sends its oh-so important message to process B. Now you can check whatever
you want to, and later on you can use the CONT signal to continue
process B's execution, which will then receive and process the message sent from
process A.
Now, many other signals are catchable, and this includes the famous
SEGV and BUS signals. You probably have seen
numerous occasions when a program has exited with a message such as
'Segmentation Violation - Core Dumped', or
'Bus Error - core dumped'. In the first occasion, a SEGV
signal was
sent to your program due to accessing an illegal memory address. In the
second case, a BUS signal was sent to your program, due to
accessing a memory address with invalid alignment. In both cases, it is
possible to catch these signals in order to do some cleanup - kill child
processes, perhaps remove temporary files, etc. Although in both cases,
the memory used by your process is most likely corrupt, it's probable that
only a small part of it was corrupt, so cleanup is still usually possible.
Default Signal Handlers
If you install no signal handlers of your own (remember what a signal handler
is? yes, that function handling a signal?), the runtime environment sets up
a set of default signal handlers for your program. For example, the default
signal handler for the TERM signal calls the
exit() system call. The default handler for the ABRT
is to dump the process's memory image into a file named 'core' in the process's
current directory, and then exit. Note that the 'core' file might actually have
some suffix (such as 'core.567') on some systems.
Installing Signal Handlers
There are several ways to install signal handlers. We'll use the most
basic form here, and refer you to your manual pages for further reading.
The signal() System Call
The signal() system call is used to set a signal handler for a
single signal type. signal() accepts a signal number and a
pointer to a signal handler function, and sets that handler to accept the
given signal. As an example, here is a code snippest that causes the program
to print the string "Don't do that" when a user presses Ctrl-C:
#include <stdio.h> /* standard I/O functions */
#include <unistd.h> /* standard unix functions, like getpid() */
#include <sys/types.h> /* various type definitions, like pid_t */
#include <signal.h> /* signal name macros, and the signal() prototype */
/* first, here is the signal handler */
void catch_int(int sig_num)
{
/* re-set the signal handler again to catch_int, for next time */
signal(SIGINT, catch_int);
/* and print the message */
printf("Don't do that");
fflush(stdout);
}
.
.
.
/* and somewhere later in the code.... */
.
.
/* set the INT (Ctrl-C) signal handler to 'catch_int' */
signal(SIGINT, catch_int);
/* now, lets get into an infinite loop of doing nothing. */
for ( ;; )
pause();
The complete source code for this program is found in the
catch-ctrl-c.c file.
Notes:
- the
pause() system call causes the process to
halt execution, until a signal is received. it is surely better than a
'busy wait' infinite loop.
- the name of a function in C/C++ is actually a pointer
to the function, so when you're asked to supply a pointer to a function,
you may simply specify its name instead.
- On some systems (such as Linux), when a signal handler is called,
the system automatically resets the signal handler for that signal to
the default handler. Thus, we re-assign the signal handler immediately
when entering the handler function. Otherwise, the next time this
signal is received, the process will exit (default behavior for
INT signals). Even on systems that do not behave in this way,
it still won't hurt, so adding this line always is a good idea.
Pre-defined Signal Handlers
For our convenience, there are two pre-defined signal handler functions
that we can use, instead of writing our own: SIG_IGN and
SIG_DFL .
-
SIG_IGN :
-
Causes the process to ignore the specified signal. For example, in order
to ignore Ctrl-C completely (useful for programs that must NOT be
interrupted in the middle, or in critical sections), write this:
signal(SIGINT, SIG_IGN);
-
SIG_DFL :
-
Causes the system to set the default signal handler for the given signal
(i.e. the same handler the system would have assigned for the signal when
the process started running):
signal(SIGTSTP, SIG_DFL);
Avoiding Signal Races - Masking Signals
One of the nasty problems that might occur when handling a signal, is
the occurrence of a second signal while the signal handler function executes.
Such a signal might be of a different type than the one being handled, or
even of the same type. Thus, we should take some precautions inside the signal
handler function, to avoid races.
Luckily, the system also contains some features that will allow us to
block signals from being processed. These can be used in two 'contexts' -
a global context which affects all signal handlers, or a per-signal type
context - that only affects the signal handler for a specific signal type.
Masking signals with sigprocmask()
the (modern) "POSIX" function used to mask signals in the global context,
is the sigprocmask() system call. It allows us to specify a set
of signals to block, and returns the list of signals that were previously
blocked. This is useful when we'll want to restore the previous masking
state once we're done with our critical section.
Note: each process on a unix system has its own signals mask, which
is used by the operating system to specify which signals should be delivered
to the proces, and which should be blocked. The sigprocmask
system call is used to take a signals mask we created in user space,
and update the one in stored in the kernel, using this user-space mask.
The mask stored in the kernel is the one later considered by the operating
system, when deciding whether to deliver a signal to the process, or block it.
sigprocmask() accepts 3 parameters:
int how
- defines if we want to add signals to the current process's
mask (
SIG_BLOCK ), remove them from the current mask
(SIG_UNBLOCK ),
or completely replace the current mask with the new mask
(SIG_SETMASK ).
const sigset_t *set
- The set of signals to be blocked, or to be added to the current mask, or
removed from the current mask (depending on the 'how' parameter).
sigset_t *oldset
- If this parameter is not
NULL , then it'll contain the
previous mask. We can later use this set to restore the situation
back to how it was before we called sigprocmask() .
Note: Older systems do not support the sigprocmask()
system call. Instead, one should use the sigmask() and
sigsetmask() system calls. If you have
such an operating system handy, please read the manual pages for these
system calls. They are simpler to use than sigprocmask, so it shouldn't
be too hard understanding them once you've read this section.
You probably wonder what are these sigset_t variables, and how
they are manipulated. Well, i wondered too, so i went to the manual page of
sigsetops , and found the answer. There are several functions to
handle these sets. Lets learn them using some example code:
/* define a new mask set */
sigset_t mask_set;
/* first clear the set (i.e. make it contain no signal numbers) */
sigemptyset(&mask_set);
/* lets add the TSTP and INT signals to our mask set */
sigaddset(&mask_set, SIGTSTP);
sigaddset(&mask_set, SIGINT);
/* and just for fun, lets remove the TSTP signal from the set. */
sigdelset(&mask_set, SIGTSTP);
/* finally, lets check if the INT signal is defined in our set */
if (sigismember(&mask_set, SIGINT)
printf("signal INT is in our set\n");
else
printf("signal INT is not in our set - how strange...\n");
/* finally, lets make the set contain ALL signals available on our system */
sigfillset(&mask_set)
Now that we know all these little secrets, lets see a short code example
that counts the number of Ctrl-C signals a user has hit, and on the 5th time
(note - this number was "Stolen" from some quite famous Unix program) asks
the user if they really want to exit. Further more, if the user hits Ctrl-Z,
the number of Ctrl-C presses is printed on the screen.
/* first, define the Ctrl-C counter, initialize it with zero. */
int ctrl_c_count = 0;
#define CTRL_C_THRESHOLD 5
/* the Ctrl-C signal handler */
void catch_int(int sig_num)
{
sigset_t mask_set; /* used to set a signal masking set. */
sigset_t old_set; /* used to store the old mask set. */
/* re-set the signal handler again to catch_int, for next time */
signal(SIGINT, catch_int);
/* mask any further signals while we're inside the handler. */
sigfillset(&mask_set);
sigprocmask(SIG_SETMASK, &mask_set, &old_set);
/* increase count, and check if threshold was reached */
ctrl_c_count++;
if (ctrl_c_count >= CTRL_C_THRESHOLD) {
char answer[30];
/* prompt the user to tell us if to really exit or not */
printf("\nRealy Exit? [y/N]: ");
fflush(stdout);
gets(answer);
if (answer[0] == 'y' || answer[0] == 'Y') {
printf("\nExiting...\n");
fflush(stdout);
exit(0);
}
else {
printf("\nContinuing\n");
fflush(stdout);
/* reset Ctrl-C counter */
ctrl_c_count = 0;
}
}
/* no need to restore the old signal mask -
this is done automatically, */{{/COMMENT_FONT}*/
/* by the operating system,
when a signal handler returns. */{{/COMMENT_FONT}*/
}
/* the Ctrl-Z signal handler */
void catch_suspend(int sig_num)
{
sigset_t mask_set; /* used to set a signal masking set. */
sigset_t old_set; /* used to store the old mask set. */
/* re-set the signal handler again to catch_suspend, for next time */
signal(SIGTSTP, catch_suspend);
/* mask any further signals while we're inside the handler. */
sigfillset(&mask_set);
sigprocmask(SIG_SETMASK, &mask_set, &old_set);
/* print the current Ctrl-C counter */
printf("\n\nSo far, '%d' Ctrl-C presses were counted\n\n", ctrl_c_count);
fflush(stdout);
/* no need to restore the old signal mask -
this is done automatically, */{{/COMMENT_FONT}*/
/* by the operating system,
when a signal handler returns. */{{/COMMENT_FONT}*/
}
.
.
/* and somewhere inside the main function... */
.
.
/* set the Ctrl-C and Ctrl-Z signal handlers */
signal(SIGINT, catch_int);
signal(SIGTSTP, catch_suspend);
.
.
/* and then the rest of the program */
.
.
The complete source code for this program is found in the
count-ctrl-c.c file.
You should note that using sigprocmask() the way we did now
does not resolve all possible race conditions. For example, It is possible
that after we entered the signal handler, but before we managed to call the
sigprocmask() system call, we receive another signal, which WILL
be called. Thus, if the user is VERY quick (or the system is very slow),
it is possible to get into races. In our current functions, this will
probably not disturb the flow, but there might be cases where this kind
of race could cause problems.
The way to guarantee no races at all, is to
let the system set the signal masking for us before it calls the signal
handler. This can be done if we use the sigaction() system
call to define both the signal handler function AND the signal mask to be used
when the handler is executed. You would probably be able to read the manual
page for sigaction() on your own, now that you're familiar with
the various concepts of signal handling. On old systems, however, you won't
find this system call, but you still might find the sigvec()
call, that enables a similar functionality.
One final note - signal handling for processes is done quite differently then
it is done for threads. with threads, the situation is more complicated
(e.g. signals masked by one thread could still be sent to another thread).
How signals are handled for threads, is out of the scope of this tutorial.
Implementing Timers Using Signals
One of the weak aspects of Unix-like operating systems is their lack of
proper support for timers. Timers are important to allow one to check timeouts
(e.g. wait for user input up to 30 seconds, or exit), check some conditions
on a regular basis (e.g. check every 30 seconds that a server we're talking to
is still active, or close the connection and notify the user about the problem),
and so on. There are various ways to get around the problem for programs that
use an "event loop" based on the select() system call (or its
new replacement, the poll() system call), but not all programs
work that way, and this method is too complex for short and simple programs.
Yet, the operating system gives us a simple way of setting up timers that
don't require too much hassles, by using special alarm signals. They are
generally limited to one timer active at a time, but that will suffice in
simple cases.
The alarm() System Call
The alarm() system call is used to ask the system to send our
process a special signal, named ALRM , after a given number of
seconds. Since Unix-like systems don't operate as real-time systems,
your process might receive this signal after a longer time than requested.
Combining this system call with a proper signal handler enables us to
do some simple tricks. Lets see an example of a program that waits for
user input, but exits if none was given after a certain timeout.
#include <unistd.h> /* standard unix functions, like alarm() */
#include <signal.h> /* signal name macros, and the signal() prototype */
char user[40]; /* buffer to read user name from the user */
/* define an alarm signal handler. */
void catch_alarm(int sig_num)
{
printf("Operation timed out. Exiting...\n\n");
exit(0);
}
.
.
/* and inside the main program... */
.
.
/* set a signal handler for ALRM signals */
signal(SIGALRM, catch_alarm);
/* prompt the user for input */
printf("Username: ");
fflush(stdout);
/* start a 30 seconds alarm */
alarm(30);
/* wait for user input */
gets(user);
/* remove the timer, now that we've got the user's input */
alarm(0);
.
.
/* do something with the received user name */
.
.
The complete source code for this program is found in the
use-alarms.c file.
As you can see, we start the timer right before waiting for user input.
If we started it earlier, we'll be giving the user less time than promised
to perform the operation. We also stop the timer right after getting the
user's input, before doing any tests or processing of the input, to avoid
a random timer from setting off due to slow processing of the input. Many
bugs that occur using the alarm() system call occur due to
forgetting to set off the timer at various places when the code gets
complicated. If you need to make some tests during the input phase,
put the whole piece of code in a function, so it'll be easier to make sure
that the timer is always set off after calling the function. Here is an
example of how NOT to do this:
int read_user_name(char user[])
{
int ok_len;
int result = 0; /* assume failure */
/* set an alarm to timeout the input operation */
alarm(3); /* Mistake 1 */
printf("Enter user name: "); /* Mistake 2 */
fflush(stdout);
if (gets(user) == NULL) {
printf("End of input received.\n");
return 0; /* Mistake 3 */
}
/* count the size of prefix of 'user' made only of characters */
/* in the given set. if all characters in 'user' are in the */
/* set, then ok_len with be equal to the length of 'user'. */
ok_len = strspn(user, "abcdefghijklmnopqrstuvwxyz0123456789");
if (ok_len == strlen(user)) {
/* check if the user exists in our database */
result = find_user_in_database(user); /* Mistake 4 */
}
alarm(0);
return result;
}
Lets count the mistakes and bad programming practices in the above function:
- * Too short timeout:
3 seconds might be enough for superman to type his user name,
but a normal user obviously needs more time. Such a timeout should
actually be tunable, because not all people type at the same pace,
or should be long enough for even the slowest of users.
- * Printing while timer is ticking:
Norty Norty. This printing should have been done before starting up
the timer. Printing may be a time consuming operation, and thus will leave
less time than expected for the user to type in the expected input.
- * Exiting function without turning off timer:
This kind of mistake is hard to catch. It will cause the program to
randomally exit somewhere LATER during the execution. If we'll trace it with
a debugger, we'll see that the signal was received while we were already
executing a completely different part of the program, leaving us scratching
our head and looking up to the sky, hoping somehow inspiration will fall
from there to guide us to the location of the problem.
- * Making long checks before turning off timer:
As if it's not enough that we gave the poor user a short time to check
for the input, we're also inserting some database checking operation
while the timer is still ticking. Even if we also want to timeout
the database operation, we should probably set up a different timer
(and a different ALRM signal handler), so as not to confuse
a slow user with a slow database server. It will also allow the user to
know why we are timing out. Without this information, a person
trying to figure out why the program suddenly exits, will have hard time
finding where the fault lies.
Summary - "Do" and "Don't" inside A Signal Handler
We have seen a few "thumb rules" all over this tutorial, and there are quite
many of those, as the area of signals handling is rather tricky. Lets try
to summarize these rules of thumb here:
- Make it short - the signal handler should be a short function that
returns quickly. Instead of doing complex operations inside the signal
handler, it is better that the function will raise a flag (e.g.
a global variable, although these are evil by themselves) and have
the main program check that flag occasionally.
- Proper Signal Masking - don't be too lazy to define proper signal
masking for a signal handler, preferably using the
sigaction() system call. It takes a little more effort
than just using the signal() system call, but it'll help
you sleep better at night, knowing that you haven't left an extra place
for race conditions to occur. Remember - if some bug has a probability
of 1/10,000 to occur, it WILL occur when many people use that program
many times, as tends to be the case with good programs (you write only
good programs, no?).
- Careful with "fault" signals - If you catch signals that indicate
a program bug (
SIGBUS , SIGSEGV ,
SIGFPE ), don't try to be too smart and
let the program continue, unless you know exactly what you are doing
(which is a very rare case) - just do the minimal required cleanup,
and exit, preferably with a core dump (using the abort()
function). Such signals usually indicate a bug in the program, that if
ignored will most likely cause it to crush sooner or later, making
you think the problem is somewhere else in the code.
- Careful with timers - when you use timers, remember that you
can only use one timer at a time, unless you also (ab)use the
VTALRM
signal. If you need to have more than one timer active at a time,
don't use signals, or devise a set of functions that will allow you
to have several virtual timers using a delta list of some sort. If
you've no idea what I'm talking about, you probably don't need several
simultaneous timers in the first place.
- Signals are NOT an event driven framework - it is easy to get
carried away and try turning the signals system into an event-driven
driver for a program, but signal handling functions were not meant for
that. If you need such a thing, use some framework that is more
suitable for the application (e.g. use an event loop of a windowing
program, use a select-based loop inside a network server, etc.).
|