Search     or:     and:
 LINUX 
 Language 
 Kernel 
 Package 
 Book 
 Test 
 OS 
 Forum 
iakovlev.org

Autoconf, Automake, and Libtool

Авторы :

4. Introducing `Makefile's

A `Makefile' is a specification of dependencies between files and how to resolve those dependencies such that an overall goal, known as a target, can be reached. `Makefile's are processed by the make utility. Other references describe the syntax of `Makefile's and the various implementations of make in detail. This chapter provides an overview into `Makefile's and gives just enough information to write custom rules in a `Makefile.am'

4.1 Targets and dependencies

The make program attempts to bring a target up to date by bring all of the target's dependencies up to date. These dependencies may have further dependencies. Thus, a potentially complex dependency graph forms when processing a typical `Makefile'. From a simple `Makefile' that looks like this:

 
all: foo
 
 foo: foo.o bar.o baz.o
 
 .c.o:
         $(CC) $(CFLAGS) -c $< -o $@
 
 .l.c:
         $(LEX) $< && mv lex.yy.c $@
 

We can draw a dependency graph that looks like this:

 
               all
                 |
                foo
                 |
         .-------+-------.
        /        |        \
     foo.o     bar.o     baz.o
       |         |         |
     foo.c     bar.c     baz.c
                           |
                         baz.l
 

Unless the `Makefile' contains a directive to make, all targets are assumed to be filename and rules must be written to create these files or somehow bring them up to date.

When leaf nodes are found in the dependency graph, the `Makefile' must include a set of shell commands to bring the dependent up to date with the dependency. Much to the chagrin of many make users, up to date means the dependent has a more recent timestamp than the target. Moreover, each of these shell commands are run in their own sub-shell and, unless the `Makefile' instructs make otherwise, each command must exit with an exit code of 0 to indicate success.

Target rules can be written which are executed unconditionally. This is achieved by specifying that the target has no dependents. A simple rule which should be familiar to most users is:

4.2 Makefile syntax

`Makefile's have a rather particular syntax that can trouble new users. There are many implementations of make, some of which provide non-portable extensions. An abridged description of the syntax follows which, for portability, may be stricter than you may be used to.

Comments start with a `#' and continue until the end of line. They may appear anywhere except in command sequences--if they do, they will be interpreted by the shell running the command. The following `Makefile' shows three individual targets with dependencies on each:

 
target1:  dep1 dep2 ... depN
 <tab>	  cmd1
 <tab>	  cmd2
 <tab>	  ...
 <tab>	  cmdN
 
 target2:  dep4 dep5
 <tab>	  cmd1
 <tab>	  cmd2
 
 dep4 dep5:
 <tab>	  cmd1
 

Target rules start at the beginning of a line and are followed by a colon. Following the colon is a whitespace separated list of dependencies. A series of lines follow which contain shell commands to be run by a sub-shell (the default is the Bourne shell). Each of these lines must be prefixed by a horizontal tab character. This is the most common mistake made by new make users.

These commands may be prefixed by an `@' character to prevent make from echoing the command line prior to executing it. They may also optionally be prefixed by a `-' character to allow the rule to continue if the command returns a non-zero exit code. The combination of both characters is permitted.

4.3 Macros

A number of useful macros exist which may be used anywhere throughout the `Makefile'. Macros start with a dollar sign, like shell variables. Our first `Makefile' used a few:

 
$(CC) $(CFLAGS) -c $< -o $@
 

Here, syntactic forms of `$(..)' are make variable expansions. It is possible to define a make variable using a `var=value' syntax:

 
CC = ec++
 

In a `Makefile', $(CC) will then be literally replaced by `ec++'. make has a number of built-in variables and default values. The default value for `$(CC)' is cc.

Other built-in macros exist with fixed semantics. The two most common macros are $@ and $<. They represent the names of the target and the first dependency for the rule in which they appear. $@ is available in any rule, but for some versions of make $< is only available in suffix rules. Here is a simple `Makefile':

 
all:    dummy
 	@echo "$@ depends on dummy"
 
 dummy:
 	touch $@
 

This is what make outputs when processing this `Makefile':

 
$ make
 touch dummy
 all depends on dummy
 

The GNU Make manual documents these macros in more detail.

4.4 Suffix rules

To simplify a `Makefile', there is a special kind of rule syntax known as a suffix rule. This is a wildcard pattern that can match targets. Our first `Makefile' used some. Here is one:

 
.c.o:
 	$(CC) $(CFLAGS) -c $< -o $@
 

Unless a more specific rule matches the target being sought, this rule will match any target that ends in `.o'. These files are said to always be dependent on `.c'. With some background material now presented, let's take a look at these tools in use.

5. A Minimal GNU Autotools Project

This chapter describes how to manage a minimal project using the GNU Autotools. A minimal project is defined to be the smallest possible project that can still illustrate a sufficient number of principles in using the tools. By studying a smaller project, it becomes easier to understand the more complex interactions between these tools when larger projects require advanced features.

The example project used throughout this chapter is a fictitious command interpreter called foonly. foonly is written in C, but like many interpreters, uses a lexical analyzer and a parser expressed using the lex and yacc tools. The package will be developed to adhere to the GNU `Makefile' standard, which is the default behavior for Automake.

There are many features of the GNU Autotools that this small project will not utilize. The most noteworthy one is libraries; this package does not produce any libraries of its own, so Libtool will not feature in this chapter. The more complex projects presented in 9. A Small GNU Autotools Project and 12. A Large GNU Autotools Project will illustrate how Libtool participates in the build system. The purpose of this chapter will be to provide a high-level overview of the user-written files and how they interact.

5.1 User-Provided Input Files

The smallest project requires the user to provide only two files. The remainder of the files needed to build the package are generated by the GNU Autotools (see section 5.2 Generated Output Files).

  • `Makefile.am' is an input to automake.
  • `configure.in' is an input to autoconf.

I like to think of `Makefile.am' as a high-level, bare-bones specification of a project's build requirements: what needs to be built, and where does it go when it is installed? This is probably Automake's greatest strength--the description is about as simple as it could possibly be, yet the final product is a `Makefile' with an array of convenient make targets.

The `configure.in' is a template of macro invocations and shell code fragments that are used by autoconf to produce a `configure' script (see section C. Generated File Dependencies). autoconf copies the contents of `configure.in' to `configure', expanding macros as they occur in the input. Other text is copied verbatim.

Let's take a look at the contents of the user-provided input files that are relevant to this minimal project. Here is the `Makefile.am':

 
bin_PROGRAMS = foonly
 foonly_SOURCES = main.c foo.c foo.h nly.c scanner.l parser.y
 foonly_LDADD = @LEXLIB@
 

This `Makefile.am' specifies that we want a program called `foonly' to be built and installed in the `bin' directory when make install is run. The source files that are used to build `foonly' are the C source files `main.c', `foo.c', `nly.c' and `foo.h', the lex program in `scanner.l' and a yacc grammar in `parser.y'. This points out a particularly nice aspect about Automake: because lex and yacc both generate intermediate C programs from their input files, Automake knows how to build such intermediate files and link them into the final executable. Finally, we must remember to link a suitable lex library, if `configure' concludes that one is needed.

And here is the `configure.in':

 
dnl Process this file with autoconf to produce a configure script.
 AC_INIT(main.c)
 AM_INIT_AUTOMAKE(foonly, 1.0)
 AC_PROG_CC
 AM_PROG_LEX
 AC_PROG_YACC
 AC_OUTPUT(Makefile)
 

This `configure.in' invokes some mandatory Autoconf and Automake initialization macros, and then calls on some Autoconf macros from the AC_PROG family to find suitable C compiler, lex, and yacc programs. Finally, the AC_OUTPUT macro is used to cause the generated `configure' script to output a `Makefile'---but from what? It is processed from `Makefile.in', which Automake produces for you based on your `Makefile.am' (see section 5.2 C. Generated File Dependencies).

5.2 Generated Output Files

By studying the diagram in C. Generated File Dependencies, it should be possible to see which commands must be run to generate the required output files from the input files shown in the last section.

First, we generate `configure':

 
$ aclocal
 $ autoconf
 

Because `configure.in' contains macro invocations which are not known to autoconf itself--AM_INIT_AUTOMAKE being a case in point, it is necessary to collect all of the macro definitions for autoconf to use when generating `configure'. This is done using the aclocal program, so called because it generates `aclocal.m4' (see section C. Generated File Dependencies). If you were to examine the contents of `aclocal.m4', you would find the definition of the AM_INIT_AUTOMAKE macro contained within.

After running autoconf, you will find a `configure' script in the current directory. It is important to run aclocal first because automake relies on the contents of `configure.in' and `aclocal.m4'. On to automake:

 
$ automake --add-missing
 automake: configure.in: installing ./install-sh
 automake: configure.in: installing ./mkinstalldirs
 automake: configure.in: installing ./missing
 automake: Makefile.am: installing ./INSTALL
 automake: Makefile.am: required file ./NEWS not found
 automake: Makefile.am: required file ./README not found
 automake: Makefile.am: installing ./COPYING
 automake: Makefile.am: required file ./AUTHORS not found
 automake: Makefile.am: required file ./ChangeLog not found
 

The `--add-missing' option copies some boilerplate files from your Automake installation into the current directory. Files such as `COPYING', which contain the GNU General Public License change infrequently, and so can be generated without user intervention. A number of utility scripts are also installed--these are used by the generated `Makefile's, particularly by the install target. Notice that some required files are still missing. These are:

`NEWS'
A record of user-visible changes to a package. The format is not strict, but the changes to the most recent version should appear at the top of the file.

`README'
The first place a user will look to get an overview for the purpose of a package, and perhaps special installation instructions.

`AUTHORS'
Lists the names, and usually mail addresses, of individuals who worked on the package.

`ChangeLog'
The ChangeLog is an important file--it records the changes that are made to a package. The format of this file is quite strict (see section 5.5 Documentation and ChangeLogs).

For now, we'll do enough to placate Automake:

 
$ touch NEWS README AUTHORS ChangeLog
 $ automake --add-missing
 

Automake has now produced a `Makefile.in'. At this point, you may wish to take a snapshot of this directory before we really let loose with automatically generated files.

By now, the contents of the directory will be looking fairly complete and reminiscent of the top-level directory of a GNU package you may have installed in the past:

 
AUTHORS	   INSTALL      NEWS        install-sh    mkinstalldirs
 COPYING    Makefile.am  README      configure     missing 
 ChangeLog  Makefile.in  aclocal.m4  configure.in
 

It should now be possible to package up your tree in a tar file and give it to other users for them to install on their own systems. One of the make targets that Automake generates in `Makefile.in' makes it easy to generate distributions . A user would merely have to unpack the tar file, run configure (see section 3. How to run configure and make) and finally type make all:

 
$ ./configure
 creating cache ./config.cache
 checking for a BSD compatible install... /usr/bin/install -c
 checking whether build environment is sane... yes
 checking whether make sets ${MAKE}... yes
 checking for working aclocal... found
 checking for working autoconf... found
 checking for working automake... found
 checking for working autoheader... found
 checking for working makeinfo... found
 checking for gcc... gcc
 checking whether the C compiler (gcc  ) works... yes
 checking whether the C compiler (gcc  ) is a cross-compiler... no
 checking whether we are using GNU C... yes
 checking whether gcc accepts -g... yes
 checking how to run the C preprocessor... gcc -E
 checking for flex... flex
 checking for flex... (cached) flex
 checking for yywrap in -lfl... yes
 checking lex output file root... lex.yy
 checking whether yytext is a pointer... yes
 checking for bison... bison -y
 updating cache ./config.cache
 creating ./config.status
 creating Makefile
 
 $ make all
 gcc -DPACKAGE=\"foonly\" -DVERSION=\"1.0\" -DYYTEXT_POINTER=1  -I. -I. \
   -g -O2 -c main.c
 gcc -DPACKAGE=\"foonly\" -DVERSION=\"1.0\" -DYYTEXT_POINTER=1  -I. -I. \
   -g -O2 -c foo.c
 flex   scanner.l && mv lex.yy.c scanner.c
 gcc -DPACKAGE=\"foonly\" -DVERSION=\"1.0\" -DYYTEXT_POINTER=1  -I. -I. \
   -g -O2 -c scanner.c
 bison -y   parser.y && mv y.tab.c parser.c
 if test -f y.tab.h; then \
   if cmp -s y.tab.h parser.h; then rm -f y.tab.h; \
   else mv y.tab.h parser.h; fi; \
 else :; fi
 gcc -DPACKAGE=\"foonly\" -DVERSION=\"1.0\" -DYYTEXT_POINTER=1  -I. -I. \
   -g -O2 -c parser.c
 gcc  -g -O2  -o foonly  main.o foo.o scanner.o parser.o -lfl 
 

5.3 Maintaining Input Files

If you edit any of the GNU Autotools input files in your package, it is necessary to regenerate the machine generated files for these changes to take effect. For instance, if you add a new source file to the foonly_SOURCES variable in `Makefile.am'. It is necessary to re-generate the derived file `Makefile.in'. If you are building your package, you need to re-run configure to re-generate the site-specific `Makefile', and then re-run make to compile the new source file and link it into `foonly'.

It is possible to regenerate these files by running the required tools, one at a time. However, as we can see above, it can be difficult to compute the dependencies--does a particular change require aclocal to be run? Does a particular change require autoconf to be run? There are two solutions to this problem.

The first solution is to use the autoreconf command. This tool regenerates all derived files by re-running all of the necessary tools in the correct order. It is somewhat of a brute force solution, but it works very well, particularly if you are not trying to accommodate other maintainers, or regular maintenance that would render this command bothersome.

The alternative is Automake's `maintainer mode'. By invoking the AM_MAINTAINER_MODE macro from `configure.in', automake will activate an `--enable-maintainer-mode' option in `configure'. This is explained at length in 8. Bootstrapping.

5.4 Packaging Generated Files

The debate about what to do with generated files is one which is keenly contested on the relevant Internet mailing lists. There are two points of view and I will present both of them to you so that you can try to decide what the best policy is for your project.

One argument is that generated files should not be included with a package, but rather only the `preferred form' of the source code should be included. By this definition, `configure' is a derived file, just like an object file, and it should not be included in the package. Thus, the user should use the GNU Autotools to bootstrap themselves prior to building the package. I believe there is some merit to this purist approach, as it discourages the practice of packaging derived files.

The other argument is that the advantages of providing these files can far outweigh the violation of good software engineering practice mentioned above. By including the generated files, users have the convenience of not needing to be concerned with keeping up to date with all of the different versions of the tools in active use. This is especially true for Autoconf, as `configure' scripts are often generated by maintainers using locally modified versions of autoconf and locally installed macros. If `configure' were regenerated by the user, the result could be different to that intended. Of course, this is poor practice, but it happens to reflect reality.

I believe the answer is to include generated files in the package when the package is going to be distributed to a wide user community (ie. the general public). For in-house packages, the former argument might make more sense, since the tools may also be held under version control.

5.5 Documentation and ChangeLogs

As with any software project, it is important to maintain documentation as the project evolves--the documentation must reflect the current state of the software, but it must also accurately record the changes that have been made in the past. The GNU coding standard rigorously enforces the maintenance of documentation. Automake, in fact, implements some of the standard by checking for the presence of a `ChangeLog' file when automake is run!

A number of files exist, with standardized filenames, for storing documentation in GNU packages. The complete GNU coding standard, which offers some useful insights, can be found at http://www.gnu.org/prep/standards.html.

Other projects, including in-house projects, can use these same tried-and-true techniques. The purpose of most of the standard documentation files was outlined earlier See section 5.2 Generated Output Files, but the `ChangeLog' deserves additional treatment.

When recording changes in a `ChangeLog', one entry is made per person. Logical changes are grouped together, while logically distinct changes (ie. `change sets') are separated by a single blank line. Here is an example from Automake's own `ChangeLog':

 
1999-11-21  Tom Tromey  <tromey@cygnus.com>
 
         * automake.in (finish_languages): Only generate suffix rule
         when not doing dependency tracking.
 
         * m4/init.m4 (AM_INIT_AUTOMAKE): Use AM_MISSING_INSTALL_SH.
         * m4/missing.m4 (AM_MISSING_INSTALL_SH): New macro.
 
         * depend2.am: Use @SOURCE@, @OBJ@, @LTOBJ@, @OBJOBJ@,
         and @BASE@.  Always use -o.
 

Another important point to make about `ChangeLog' entries is that they should be brief. It is not necessary for an entry to explain in details why a change was made, but rather what the change was. If a change is not straightforward then the explanation of why belongs in the source code itself. The GNU coding standard offers the complete set of guidelines for keeping `ChangeLog's. Although any text editor can be used to create ChangeLog entries, Emacs provides a major mode to help you write them.

6. Writing `configure.in'

Writing a portable `configure.in' is a tricky business. Since you can put arbitrary shell code into `configure.in', your options seem overwhelming. There are many questions the first-time Autoconf user asks: What constructs are portable and what constructs aren't portable? How do I decide what to check for? What shouldn't I check for? How do I best use Autoconf's features? What shouldn't I put in `configure.in'? In what order should I run my checks? When should I look at the name of the system instead of checking for specific features?

6.1 What is Portability?

Before we talk about the mechanics of deciding what to check for and how to check for it, let's ask ourselves a simple question: what is portability? Portability is a quality of the code that enables it to be built and run on a variety of platforms. In the Autoconf context, portability usually refers to the ability to run on Unix-like systems--sometimes including Windows.

When I first started using Autoconf, I had a hard time deciding what to check for in my `configure.in'. At the time, I was maintaining a proprietary program that ran only on SunOS 4. However, I was interested in porting it to Solaris, OSF/1, and possibly Irix.

The approach I took, while workable, was relatively time-consuming and painful: I wrote a minimal `configure.in' and then proceeded to simply try to build my program on Solaris. Each time I encountered a build problem, I updated `configure.in' and my source and started again. Once it built correctly, I started testing to see if there were runtime problems related to portability.

Since I didn't start with a relatively portable base, and since I was unaware of the tools available to help with adding Autoconf support to a package (see section 24. Migrating an Existing Package to GNU Autotools), it was much more difficult than it had to be. If at all possible, it is better to write portable code to begin with.

There are a large number of Unix-like systems in the world, including many systems which, while still running, can only be considered obsolete. While it is probably possible to port some programs to all such systems, typically it isn't useful to even try. Porting to everything is a difficult process, especially given that it usually isn't possible to test on all platforms, and that new operating systems, with their own bugs and idiosyncracies are released every year.

We advocate a pragmatic approach to portability: we write our programs to target a fairly large, but also fairly modern, cross-section of Unix-like systems. As deficiencies are discovered in our portability framework, we update `configure.in' and our sources, and move on. In practice, this is an effective approach.

6.2 Brief introduction to portable sh

If you read a number of `configure.in's, you'll quickly notice that they tend to be written in an unusual style. For instance, you'll notice you hardly ever see the `[' program used; instead you'll see `test' invoked. We won't go into all the details of writing a portable shell script here; instead we leave that for 22. Writing Portable Bourne Shell.

Like other aspects of portability, the approach you take to writing shell scripts in `configure.in' and `Makefile.am' should depend on your goals. Some platforms have notoriously broken sh implementations. For instance, Ultrix sh doesn't implement unset. Of course, the GNU Autotools are written in the most portable style possible, so as not to limit your possibilities.

Also, it doesn't really make sense to talk about portable sh programming in the abstract. sh by itself does very little; most actual work is done by separate programs, each with its own potential portability problems. For instance, some options are not portable between systems, and some seemingly common programs don't exist on every system -- so not only do you have to know which sh constructs are not portable, but you also must know which programs you can (and cannot) use, and which options to those programs are portable.

This seems daunting, but in practice it doesn't seem to be too hard to write portable shell scripts -- once you've internalized the rules. Unfortunately, this process can take a long time. Meanwhile, a pragmatic `try and see' approach, while noting other portable code you've seen elsewhere, works fairly well. Once again, it pays to be aware of which architectures you'll probably care about -- you will make different choices if you are writing an extremely portable program like emacs or gcc than if you are writing something that will only run on various flavors of Linux. Also, the cost of having unportable code in `configure.in' is relatively low -- in general it is fairly easy to rewrite pieces on demand as unportable constructs are found.

6.3 Ordering Tests

In addition to the problem of writing portable sh code, another problem which confronts first-time `configure.in' writers is determining the order in which to run the various tests. Autoconf indirectly (via the autoscan program, which we cover in 24. Migrating an Existing Package to GNU Autotools) suggests a standard ordering, which is what we describe here.

The standard ordering is:

  1. Boilerplate. This section should include standard boilerplate code, such as the call to AC_INIT (which must be first), AM_INIT_AUTOMAKE, AC_CONFIG_HEADER, and perhaps AC_REVISION.

  2. Options. The next section should include macros which add command-line options to configure, such as AC_ARG_ENABLE. It is typical to put support code for the option in this section as well, if it is short enough, like this example from libgcj:

     
    AC_ARG_ENABLE(getenv-properties,
     [  --disable-getenv-properties
                               don't set system properties from GCJ_PROPERTIES])
     
     dnl Whether GCJ_PROPERTIES is used depends on the target.
     if test -n "$enable_getenv_properties"; then
        enable_getenv_properties=${enable_getenv_properties_default-yes}
     fi
     if test "$enable_getenv_properties" = no; then
        AC_DEFINE(DISABLE_GETENV_PROPERTIES)
     fi
     

  3. Programs. Next it is traditional to check for programs that are either needed by the configure process, the build process, or by one of the programs being built. This usually involves calls to macros like AC_CHECK_PROG and AC_PATH_TOOL.

  4. Libraries. Checks for libraries come before checks for other objects visible to C (or C++, or anything else). This is necessary because some other checks work by trying to link or run a program; by checking for libraries first you ensure that the resulting programs can be linked.

  5. Headers. Next come checks for existence of headers.

  6. Typedefs and structures. We do checks for typedefs after checking for headers for the simple reason that typedefs appear in headers, and we need to know which headers we can use before we look inside them.

  7. Functions. Finally we check for functions. These come last because functions have dependencies on the preceding items: when searching for functions, libraries are needed in order to correctly link, headers are needed in order to find prototypes (this is especially important for C++, which has stricter prototyping rules than C), and typedefs are needed for those functions which use or return types which are not built in.

  8. Output. This is done by invoking AC_OUTPUT.

This ordering should be considered a rough guideline, and not a list of hard-and-fast rules. Sometimes it is necessary to interleave tests, either to make `configure.in' easier to maintain, or because the tests themselves do need to be in a different order. For instance, if your project uses both C and C++ you might choose to do all the C++ checks after all the C checks are done, in order to make `configure.in' a bit easier to read.

6.4 What to check for

Deciding what to check for is really the central part of writing `configure.in'. Once you've read the Autoconf reference manual, the "how"s of writing a particular test should be fairly clear. The "when"s might remain a mystery -- and it's just as easy to check for too many things as it is to check for too few.

One notable area of divergence between various Unix-like systems is that the same programs don't exist on all systems, and, even when they do, they don't always work in the same way. For these problems we recommend, when possible, following the advice of the GNU Coding Standards: use the most common options from a relatively limited set of programs. Failing that, try to stick to programs and options specified by POSIX, perhaps augmenting this approach by doing checks for known problems on platforms you care about.

Checking for tools and their differences is usually a fairly small part of a `configure' script; more common are checks for functions, libraries, and the like.

Except for a few core libraries like `libc' and, usually, `libm' and libraries like `libX11' which typically aren't considered system libraries, there isn't much agreement about library names or contents between Unix systems. Still, libraries are easy to handle, because decisions about libraries almost always only affect the various `Makefile's. That means that checking for another library typically doesn't require major (or even, sometimes, any) changes to the source code. Also, because adding a new library test has a small impact on the development cycle -- effectively just re-running `configure' and then a relink -- you can effectively adopt a lax approach to libraries. For instance, you can just make things work on the few systems you immediately care about and then handle library changes on an as-needed basis.

Suppose you do end up with a link problem. How do you handle it? The first thing to do is use nm to look through the system libraries to see if the missing function exists. If it does, and it is in a library you can use then the solution is easy -- just add another AC_CHECK_LIB. Note that just finding the function in a library is not enough, because on some systems, some "standard" libraries are undesirable; `libucb' is the most common example of a library which you should avoid.

If you can't find the function in a system library then you have a somewhat more difficult problem: a non-portable function. There are basically three approaches to a missing function. Below we talk about functions, but really these same approaches apply, more or less, to typedefs, structures, and global variables.

The first approach is to write a replacement function and either conditionally compile it, or put it into an appropriately-named file and use AC_REPLACE_FUNCS. For instance, Tcl uses AC_REPLACE_FUNCS(strstr) to handle systems that have no strstr function.

The second approach is used when there is a similar function with a different name. The idea here is to check for all the alternatives and then modify your source to use whichever one might exist. The idiom here is to use break in the second argument to AC_CHECK_FUNCS; this is used both to skip unnecessary tests and to indicate to the reader that these checks are related. For instance, here is how libgcj checks for inet_aton or inet_addr; it only uses the first one found:

 
AC_CHECK_FUNCS(inet_aton inet_addr, break)
 

Code to use the results of these checks looks something like:

 
#if HAVE_INET_ATON
   ... use inet_aton here
 #else
 #if HAVE_INET_ADDR
   ... use inet_addr here
 #else
 #error Function missing!
 #endif
 #endif
 

Note how we've made it a compile-time error if the function does not exist. In general it is best to make errors occur as early as possible in the build process.

The third approach to non-portable functions is to write code such that these functions are only optionally used. For instance, if you are writing an editor you might decide to use mmap to map a file into the editor's memory. However, since mmap is not portable, you would also write a function to use the more portable read.

Handling known non-portable functions is only part of the problem, however. The pragmatic approach works fairly well, but it is somewhat inefficient if you are primarily developing on a more modern system, like GNU/Linux, which has few functions missing. In this case the problem is that you might not notice non-portable constructs in your code until it has largely been finished.

Unfortunately, there's no high road to solving this problem. In the end, you need to have a working knowledge of the range of existing Unix systems. Knowledge of standards such as POSIX and XPG can be useful here, as a first cut -- if it isn't in POSIX, you should at least consider checking for it. However, standards are not a panacea -- not all systems are POSIX compliant, and sometimes there are bugs in systems functions which you must work around.

One final class of problems you might encounter is that it is also easy to check for too much. This is bad because it adds unnecessary maintenance burden to your program. For instance, sometimes you'll see code that checks for <sys/types.h>. However, there's no point in doing that -- using this header is mostly portable. Again, this can only be addressed by having a practical knowledge, which is only really possible by examining your target systems.

6.5 Using Configuration Names

While feature tests are definitely the best approach, a `configure' script may occasionally have to make a decision based on a configuration name. This may be necessary if certain code must be compiled differently based on something which can not be tested using a standard Autoconf feature test. For instance, the expect package needs to find information about the system's `tty' implementation; this can't reliably be done when cross compiling without examining the particular configuration name.

It is normally better to test for particular features, rather than to test for a particular system type. This is because as Unix and other operating systems evolve, different systems copy features from one another.

When there is no alternative to testing the configuration name in a `configure' script, it is best to define a macro which describes the feature, rather than defining a macro which describes the particular system. This permits the same macro to be used on other systems which adopt the same feature (see section 23. Writing New Macros for Autoconf).

Testing for a particular system is normally done using a case statement in the autoconf `configure.in' file. The case statement might look something like the following, assuming that `host' is a shell variable holding a canonical configuration system--which will be the case if `configure.in' uses the `AC_CANONICAL_HOST' or `AC_CANONICAL_SYSTEM' macros.

 
case "${host}" in
 i[[3456]]86-*-linux-gnu*) do something ;;
 sparc*-sun-solaris2.[[56789]]*) do something ;;
 sparc*-sun-solaris*) do something ;;
 mips*-*-elf*) do something ;;
 esac
 

Note the doubled square brackets in this piece of code. These are used to work around an ugly implementation detail of autoconf---it uses M4 under the hood. Without these extra brackets, the square brackets in the case statement would be swallowed by M4, and would not appear in the resulting `configure'. This nasty detail is discussed at more length in 21. M4.

It is particularly important to use `*' after the operating system field, in order to match the version number which will be generated by `config.guess'. In most cases you must be careful to match a range of processor types. For most processor families, a trailing `*' suffices, as in `mips*' above. For the i386 family, something along the lines of `i[34567]86' suffices at present. For the m68k family, you will need something like `m68*'. Of course, if you do not need to match on the processor, it is simpler to just replace the entire field by a `*', as in `*-*-irix*'.

7. Introducing GNU Automake

The primary goal of Automake is to generate `Makefile.in's compliant with the GNU Makefile Standards. Along the way, it tries to remove boilerplate and drudgery. It also helps the `Makefile' writer by implementing features (for instance automatic dependency tracking and parallel make support) that most maintainers don't have the patience to implement by hand. It also implements some best practices as well as workarounds for vendor make bugs -- both of which require arcane knowledge not generally available.

A secondary goal for Automake is that it work well with other free software, and, specifically, GNU tools. For example, Automake has support for Dejagnu-based test suites.

Chances are that you don't care about the GNU Coding Standards. That's okay. You'll still appreciate the convenience that Automake provides, and you'll find that the GNU standards compliance feature, for the most part, assists rather than impedes.

Automake helps the maintainer with five large tasks, and countless minor ones. The basic functional areas are:

  1. Build

  2. Check

  3. Clean

  4. Install and uninstall

  5. Distribution

We cover the first three items in this chapter, and the others in later chapters. Before we get into the details, let's talk a bit about some general principles of Automake.

7.1 General Automake principles

Automake at its simplest turns a file called `Makefile.am' into a GNU-compliant `Makefile.in' for use with `configure'. Each `Makefile.am' is written according to make syntax; Automake recognizes special macro and target names and generates code based on these.

There are a few Automake rules which differ slightly from make rules:

  • Ordinary make comments are passed through to the output, but comments beginning with `##' are Automake comments and are not passed through.

  • Automake supports include directives. These directives are not passed through to the `Makefile.in', but instead are processed by automake -- files included this way are treated as if they were textually included in `Makefile.am' at that point. This can be used to add boilerplate to each `Makefile.am' in a project via a centrally-maintained file. The filename to include can start with `$(top_srcdir)' to indicate that it should be found relative to the top-most directory of the project; if it is a relative path or if it starts with `$(srcdir)' then it is relative to the current directory. For example, here is how you would reference boilerplate code from the file `config/Make-rules' (where `config' is a top-level dirctory in the project):

     
    include $(top_srcdir)/config/Make-rules
     

  • Automake supports conditionals which are not passed directly through to `Makefile.in'. This feature is discussed in 19. Advanced GNU Automake Usage.

  • Automake supports macro assignment using `+='; these assignments are translated by Automake into ordinary `=' assignments in `Makefile.in'.

All macros and targets, including those which Automake does not recognize, are passed through to the generated `Makefile.in' -- this is a powerful extension mechanism. Sometimes Automake will define macros or targets internally. If these are also defined in `Makefile.am' then the definition in `Makefile.am' takes precedence. This feature provides an easy way to tailor specific parts of the output in small ways.

Note, however, that it is a mistake to override parts of the generated code that aren't documented (and thus `exported' by Automake). Overrides like this stand a good chance of not working with future Automake releases.

Automake also scans `configure.in'. Sometimes it uses the information it discovers to generate extra code, and sometimes to provide extra error checking. Automake also turns every AC_SUBST into a `Makefile' variable. This is convenient in more ways than one: not only does it mean that you can refer to these macros in `Makefile.am' without extra work, but, since Automake scans `configure.in' before it reads any `Makefile.am', it also means that special variables and overrides Automake recognizes can be defined once in `configure.in'.

7.2 Introduction to Primaries

Each type of object that Automake understands has a special root variable name associated with it. This root is called a primary. Many actual variable names put into `Makefile.am' are constructed by adding various prefixes to a primary.

For instance, scripts--interpreted executable programs--are associated with the SCRIPTS primary. Here is how you would list scripts to be installed in the user's `bindir':

 
bin_SCRIPTS = magic-script
 

(Note that the mysterious `bin_' prefix will be discussed later.)

The contents of a primary-derived variable are treated as targets in the resulting `Makefile'. For instance, in our example above, we could generate `magic-script' using sed by simply introducing it as a target:

 
bin_SCRIPTS = magic-script
 
 magic-script: magic-script.in
 	sed -e 's/whatever//' < $(srcdir)/magic-script.in > magic-script
 	chmod +x magic-script
 

7.3 The easy primaries

This section describes the common primaries that are relatively easy to understand; the more complicated ones are discussed in the next section.

DATA
This is the easiest primary to understand. A macro of this type lists a number of files which are installed verbatim. These files can appear either in the source directory or the build directory.

HEADERS
Macros of this type list header files. These are separate from DATA macros because this allows for extra error checking in some cases.

SCRIPTS
This is used for executable scripts (interpreted programs). These are different from DATA because they are installed with different permissions and because they have the program name transform applied to them (e.g., the `--program-transform-name' argument to configure). Scripts are also different from compiled programs because the latter can be stripped while scripts cannot.

MANS
This lists man pages. Installing man pages is more complicated than you might think due to the lack of a single common practice. One developer might name a man page in the source tree `foo.man' and then rename to the real name (`foo.1') at install time. Another developer might instead use numeric suffixes in the source tree and install using the same name. Sometimes an alphabetic code follows the numeric suffix (e.g., `quux.3n'); this code must be stripped before determining the correct install directory (this file must still be installed in `$(man3dir)'). Automake supports all of these modes of operation:

  • man_MANS can be used when numeric suffixes are already in place:
     
    man_MANS = foo.1 bar.2 quux.3n
     

  • man1_MANS, man2_MANS, etc., can be used to force renaming at install time. This renaming is skipped if the suffix already begins with the correct number. For instance:
     
    man1_MANS = foo.man
     man3_MANS = quux.3n
     
    Here `foo.man' will be installed as `foo.1' but `quux.3n' will keep its name at install time.

TEXINFOS
GNU programs traditionally use the Texinfo documentation format, not man pages. Automake has full support for Texinfo, including some additional features such as versioning and install-info support. We won't go into that here except to mention that it exists. See the Automake reference manual for more information.

Automake supports a variety of lesser-used primaries such as JAVA and LISP (and, in the next major release, PYTHON). See the reference manual for more information on these.

7.4 Programs and libraries

The preceding primaries have all been relatively easy to use. Now we'll discuss a more complicated set, namely those used to build programs and libraries. These primaries are more complex because building a program is more complex than building a script (which often doesn't even need building at all).

Use the PROGRAMS primary for programs, LIBRARIES for libraries, and LTLIBRARIES for Libtool libraries (see section 10. Introducing GNU Libtool). Here is a minimal example:

 
bin_PROGRAMS = doit
 

This creates the program doit and arranges to install it in bindir. First make will compile `doit.c' to produce `doit.o'. Then it will link `doit.o' to create `doit'.

Of course, if you have more than one source file, and most programs do, then you will want to be able to list them somehow. You will do this via the program's SOURCES variable. Each program or library has a set of associated variables whose names are constructed by appending suffixes to the `normalized' name of the program. The normalized name is the name of the object with non-alphanumeric characters changed to underscores. For instance, the normalized name of `quux' is `quux', but the normalized name of `install-info' is `install_info'. Normalized names are used because they correspond to make syntax, and, like all macros, Automake propagates these definitions into the resulting `Makefile.in'.

So if `doit' is to be built from files `main.c' and `doit.c', we would write:

 
bin_PROGRAMS = doit
 doit_SOURCES = doit.c main.c
 

The same holds for libraries. In the zlib package we might make a library called `libzlib.a'. Then we would write:

 
lib_LIBRARIES = libzlib.a
 libzlib_a_SOURCES = adler32.c compress.c crc32.c deflate.c deflate.h \
 gzio.c infblock.c infblock.h infcodes.c infcodes.h inffast.c inffast.h \
 inffixed.h inflate.c inftrees.c inftrees.h infutil.c infutil.h trees.c \
 trees.h uncompr.c zconf.h zlib.h zutil.c zutil.h
 

We can also do this with libtool libraries. For instance, suppose we want to build `libzlib.la' instead:

 
lib_LTLIBRARIES = libzlib.la
 libzlib_la_SOURCES = adler32.c compress.c crc32.c deflate.c deflate.h \
 gzio.c infblock.c infblock.h infcodes.c infcodes.h inffast.c inffast.h \
 inffixed.h inflate.c inftrees.c inftrees.h infutil.c infutil.h trees.c \
 trees.h uncompr.c zconf.h zlib.h zutil.c zutil.h
 

As you can see, making shared libraries with Automake and Libtool is just as easy as making static libraries.

In the above example, we listed header files in the SOURCES variable. These are ignored (except by make dist (1)) but can serve to make your `Makefile.am' a bit clearer (and sometimes shorter, if you aren't installing headers).

Note that you can't use `configure' substitutions in a SOURCES variable. Automake needs to know the static list of files which can be compiled into your program. There are still various ways to conditionally compile files, for instance Automake conditionals or the use of the LDADD variable.

The static list of files is also used in some versions of Automake's automatic dependency tracking. The general rule is that each source file which might be compiled should be listed in some SOURCES variable. If the source is conditionally compiled, it can be listed in an EXTRA variable. For instance, suppose in this example `@FOO_OBJ@' is conditionally set by `configure' to `foo.o' when `foo.c' should be compiled:

 
bin_PROGRAMS = foo
 foo_SOURCES = main.c
 foo_LDADD = @FOO_OBJ@
 foo_DEPENDENCIES = @FOO_OBJ@
 EXTRA_foo_SOURCES = foo.c
 

In this case, `EXTRA_foo_SOURCES' is used to list sources which are conditionally compiled; this tells Automake that they exist even though it can't deduce their existence automatically.

In the above example, note the use of the `foo_LDADD' macro. This macro is used to list other object files and libraries which should be linked into the foo program. Each program or library has several such associated macros which can be used to customize the link step; here we list the most common ones:

`_DEPENDENCIES'
Extra dependencies which are added to the program's dependency list. If not specified, this is automatically computed based on the value of the program's `_LDADD' macro.

`_LDADD'
Extra objects which are passed to the linker. This is only used by programs and shared libraries.

`_LDFLAGS'
Flags which are passed to the linker. This is separate from `_LDADD' to allow `_DEPENDENCIES' to be auto-computed.

`_LIBADD'
Like `_LDADD', but used for static libraries and not programs.

You aren't required to define any of these macros.

7.5 Frequently Asked Questions

Experience has shown that there are several common questions that arise as people begin to use automake for their own projects. It seemed prudent to mention these issues here.

Users often want to make a library (or program, but for some reason it comes up more frequently with libraries) whose sources live in subdirectories:

 
lib_LIBRARIES = libsub.a
 libsub_a_SOURCES = subdir1/something.c ...
 

If you try this with Automake 1.4, you'll get an error:

 
$ automake
 automake: Makefile.am: not supported: source file subdir1/something.c is in subdirectory
 

For libraries, this problem is mostly simply solve by using libtool convenience libraries. For programs, there is no simple solution. Many people elect to restructure their package in this case.

The next major release of Automake addresses this problem.

Another general problem that comes up is that of setting compilation flags. Most rules have flags--for instance, compilation of C code automatically uses `CFLAGS'. However, these variables are considered user variables. Setting them in `Makefile.am' is unsafe, because the user will expect to be able to override them at will.

To handle this, for each flag variable, Automake introduce an `AM_' version which can be set in `Makefile.am'. For instance, we could set some flags for C and C++ compilation like so:

 
AM_CFLAGS = -DFOR_C
 AM_CXXFLAGS = -DFOR_CXX
 

Finally, people often ask how to compile a single source file in two different ways. For instance, the `etags.c' file which comes with Emacs can be compiled with different `-D' options to produce the etags and ctags programs.

With Automake 1.4 this can only be done by writing your own compilation rules, like this:

 
bin_PROGRAMS = etags ctags
 etags_SOURCES = etags.c
 ctags_SOURCES =
 ctags_LDADD = ctags.o
 
 etags.o: etags.c
 	$(CC) $(CFLAGS) -DETAGS ...
 
 ctags.o: etags.c
 	$(CC) $(CFLAGS) -DCTAGS ...
 

This is tedious and hard to maintain for larger programs. Automake 1.5 will support a much more natural approach:

 
bin_PROGRAMS = etags ctags
 etags_SOURCES = etags.c
 etags_CFLAGS = -DETAGS
 ctags_SOURCES = etags.c
 ctags_CFLAGS = -DCTAGS
 

7.6 Multiple directories

So far, we've only dealt with single-directory projects. Automake can also handle projects with many directories. The variable `SUBDIRS' is used to list the subdirectories which should be built. Here is an example from Automake itself:

 
SUBDIRS = . m4 tests
 

Automake does not need to know the list of subdirectories statically, so there is no `EXTRA_SUBDIRS' variable. You might think that Automake would use `SUBDIRS' to see which `Makefile.am's to scan, but it actually gets this information from `configure.in'. This means that, if you have a subdirectory which is optionally built, you should still list it unconditionally in your call to AC_OUTPUT and then arrange for it to be substituted (or not, as appropriate) at configure time.

Subdirectories are always built in the order they appear, but cleaning rules (e.g., maintainer-clean) are always run in the reverse order. The reason for this odd reversal is that it is wrong to remove a file before removing all the files which depend on it.

You can put `.' into `SUBDIRS' to control when the objects in the current directory are built, relative to the objects in the subdirectories. In the example above, targets in `.' will be built before subdirectories are built. If `.' does not appear in `SUBDIRS', it is built following all the subdirectories.

7.7 Testing

Automake also includes simple support for testing your program.

The most simple form of this is the `TESTS' variable. This variable holds a list of tests which are run when the user runs make check. Each test is built (if necessary) and then executed. For each test, make prints a single line indicating whether the test has passed or failed. Failure means exiting with a non-zero status, with the special exception that an exit status of `77' (2) means that the test should be ignored. make check also prints a summary showing the number of passes and fails.

Automake also supports the notion of an xfail, which is a test which is expected to fail. Sometimes this is useful when you want to track a known failure, but you aren't prepared to fix it right away. Tests which are expected to fail should be listed in both `TESTS' and `XFAIL_TESTS'.

The special prefix `check' can be used with primaries to indicate that the objects should only be built at make check time. For example, here is how you can build a program that will only be used during the testing process:

 
check_PROGRAMS = test-program
 test_program_SOURCES = ...
 

Automake also supports the use of DejaGNU, the GNU test framework. DejaGNU support can be enabled using the `dejagnu' option:

 
AUTOMAKE_OPTIONS = dejagnu
 

The resulting `Makefile.in' will include code to invoke the runtest program appropriately.

8. Bootstrapping

There are many programs in the GNU Autotools, each of which has a complex set of inputs. When one of these inputs changes, it is important to run the proper programs in the proper order. Unfortunately, it is hard to remember both the dependencies and the ordering.

For instance, whenever you edit `configure.in', you must remember to re-run aclocal in case you added a reference to a new macro. You must also rebuild `configure' by running autoconf; `config.h' by running autoheader, in case you added a new AC_DEFINE; and automake to propagate any new AC_SUBSTs to the various `Makefile.in's. If you edit a `Makefile.am', you must re-run automake. In both these cases, you must then remember to re-run config.status --recheck if `configure' changed, followed by config.status to rebuild the `Makefile's.

When doing active development on the build system for your project, these dependencies quickly become painful. Of course, Automake knows how to handle this automatically. By default, automake generates a `Makefile.in' which knows all these dependencies and which automatically re-runs the appropriate tools in the appropriate order. These rules assume that the correct versions of the tools are all in your PATH.

It helps to have a script ready to do all of this for you once, before you have generated a `Makefile' that will automatically run the tools in the correct order, or when you make a fresh checkout of the code from a CVS repository where the developers don't keep generated files under source control. There are at least two opposing schools of thought regarding how to go about this -- the autogen.sh school and the bootstrap school:

autogen.sh
From the outset, this is a poor name for a bootstrap script, since there is already a GNU automatic text generation tool called AutoGen. Often packages that follow this convention have the script automatically run the generated configure script after the boostrap process, passing autogen.sh arguments through to configure. Except you don't knon what options you want yet, since you can't run `configure --help' until configure has been generated. I suggest that if you find yourself compiling a project set up in this way that you type:

 
$ /bin/sh ./autogen.sh --help
 

and ignore the spurious warning that tells you configure will be executed.

bootstrap
Increasingly, projects are starting to call their bootstrap scripts `bootstrap'. Such scripts simply run the various commands required to bring the source tree into a state where the end user can simply:

 
$ configure
 $ make
 $ make install
 

Unfortunately, proponents of this school of thought don't put the bootstrap script in their distributed tarballs, since the script is unnecessary except when the build environment of a developer's machine has changed. This means the proponents of the autogen.sh school may never see the advantages of the other method.

Autoconf comes with a program called autoreconf which essentially does the work of the bootstrap script. autoreconf is rarely used because, historically, has not been very well known, and only in Autoconf 2.13 did it acquire the ability to work with Automake. Unfortunately, even the Autoconf 2.13 autoreconf does not handle libtoolize and some automake-related options that are frequently nice to use.

We recommend the bootstrap method, until autoreconf is fixed. At this point bootstrap has not been standardized, so here is a version of the script we used while writing this book (3):

 
#! /bin/sh
 
 aclocal \
 && automake --gnu --add-missing \
 && autoconf
 

We don't use autoreconf here because that script (as of Autoconf 2.13) also does not handle the `--add-missing' option, which we want. A typical bootstrap might also run libtoolize or autoheader.

It is also important for all developers on a project to have the same versions of the tools installed so that these rules don't inadvertantly cause problems due to differences between tool versions. This version skew problem turns out to be fairly significant in the field. So, automake provides a way to disable these rules by default, while still allowing users to enable them when they know their environment is set up correctly.

In order to enable this mode, you must first add AM_MAINTAINER_MODE to
`configure.in'. This will add the `--enable-maintainer-mode' option to `configure'; when specified this flag will cause these so-called `maintainer rules' to be enabled.

Note that maintainer mode is a controversial feature. Some people like to use it because it causes fewer bug reports in some situations. For instance, CVS does not preserve relatively timestamps on files. If your project has both `configure.in' and `configure' checked in, and maintainer mode is not in use, then sometimes make will decide to rebuild `configure' even though it is not really required. This in turn means more headaches for your developers -- on a large project most developers won't touch `configure.in' and many may not even want to install the GNU Autotools (4).

The other camp claims that end users should use the same build system that developers use, that maintainer mode is simply unaesthetic, and furthermore that the modality of maintainer mode is dangerous--you can easily forget what mode you are in and thus forget to rebuild, and thus correctly test, a change to the configure or build system. When maintainer mode is not in use, the Automake-supplied missing script will be used to warn users when it appears that they need a maintainer tool that they do not have.

9. A Small GNU Autotools Project

This chapter introduces a small--but real--worked example, to illustrate some of the features, and highlight some of the pitfalls, of the GNU Autotools discussed so far. All of the source can be downloaded from the book's web page (5). The text is peppered with my own pet ideas, accumulated over a several years of working with the GNU Autotools and you should be able to easily apply these to your own projects. I will begin by describing some of the choices and problems I encountered during the early stages of the development of this project. Then by way of illustration of the issues covered, move on to showing you a general infrastructure that I use as the basis for all of my own projects, followed by the specifics of the implementation of a portable command line shell library. This chapter then finishes with a sample shell application that uses that library.

9.1.1 Project Directory Structure

Before starting to write code for any project, you need to decide on the directory structure you will use to organise the code. I like to build each component of a project in its own subdirectory, and to keep the configuration sources separate from the source code. The great majority of GNU projects I have seen use a similar method, so adopting it yourself will likely make your project more familiar to your developers by association.

The top level directory is used for configuration files, such as `configure' and `aclocal.m4', and for a few other sundry files, `README' and a copy of the project license for example.

Any significant libraries will have a subdirectory of their own, containing all of the sources and headers for that library along with a `Makefile.am' and anything else that is specific to just that library. Libraries that are part of a small like group, a set of pluggable application modules for example, are kept together in a single directory.

The sources and headers for the project's main application will be stored in yet another subdirectory, traditionally named `src'. There are other conventional directories your developers might expect too: A `doc' directory for project documentation; and a `test' directory for the project self test suite.

To keep the project top-level directory as uncluttered as possible, as I like to do, you can take advantage of Autoconf's `AC_CONFIG_AUX_DIR' by creating another durectory, say `config', which will be used to store many of the GNU Autotools intermediate files, such as install-sh. I always store all project specific Autoconf M4 macros to this same subdirectory.

So, this is what you should start with:

 
$ pwd
 ~/mypackage
 $ ls -F
 Makefile.am  config/     configure.in  lib/  test/
 README       configure*  doc/          src/
 

9.1.2 C Header Files

There is a small amount of boiler-plate that should be added to all header files, not least of which is a small amount of code to prevent the contents of the header from being scanned multiple times. This is achieved by enclosing the entire file in a preprocessor conditional which evaluates to false after the first time it has been seen by the preprocessor. Traditionally, the macro used is in all upper case, and named after the installation path without the installation prefix. Imagine a header that will be intalled to `/usr/local/include/sys/foo.h', for example. The preprocessor code would be as follows:

 
#ifndef SYS_FOO_H
 #define SYS_FOO_H 1
 ...
 #endif /* !SYS_FOO_H */
 

Apart from comments, the entire content of the rest of this header file must be between these few lines. It is worth mentioning that inside the enclosing ifndef, the macro SYS_FOO_H must be defined before any other files are #included. It is a common mistake to not define that macro until the end of the file, but mutual dependency cycles are only stalled if the guard macro is defined before the #include which starts that cycle(6).

If a header is designed to be installed, it must #include other installed project headers from the local tree using angle-brackets. There are some implications to working like this:

  • You must be careful that the names of header file directories in the source tree match the names of the directories in the install tree. For example, when I plan to install the aforementioned `foo.h' to `/usr/local/include/project/foo.h', from which it will be included using `#include <project/foo.h>', then in order for the same include line to work in the source tree, I must name the source directory it is installed from `project' too, or other headers which use it will not be able to find it until after it has been installed.

  • When you come to developing the next version of a project laid out in this way, you must be careful about finding the correct header. Automake takes care of that for you by using `-I' options that force the compiler to look for uninstalled headers in the current source directory before searching the system directories for installed headers of the same name.

  • You don't have to install all of your headers to `/usr/include' -- you can use subdirectories. And all without having to rewrite the headers at install time.

9.1.3 C++ Compilers

In order for a C++ program to use a library compiled with a C compiler, it is neccessary for any symbols exported from the C library to be declared between `extern "C" {' and `}'. This code is important, because a C++ compiler mangles(7) all variable and function names, where as a C compiler does not. On the other hand, a C compiler will not understand these lines, so you must be careful to make them invisible to the C compiler.

Sometimes you will see this method used, written out in long hand in every installed header file, like this:

 
#ifdef __cplusplus
 extern "C" {
 #endif
 
 ...
 
 #ifdef __cplusplus
 }
 #endif
 

But that is a lot of unnecessary typing if you have a few dozen headers in your project. Also the additional braces tend to confuse text editors, such as emacs, which do automatic source indentation based on brace characters.

Far better, then, to declare them as macros in a common header file, and use the macros in your headers:

 
#ifdef __cplusplus
 #  define BEGIN_C_DECLS extern "C" {
 #  define END_C_DECLS   }
 #else /* !__cplusplus */
 #  define BEGIN_C_DECLS
 #  define END_C_DECLS
 #endif /* __cplusplus */
 

I have seen several projects that name such macros with a leading underscore -- `_BEGIN_C_DECLS'. Any symbol with a leading underscore is reserved for use by the compiler implementation, so you shouldn't name any symbols of your own in this way. By way of example, I recently ported the Small(8) language compiler to Unix, and almost all of the work was writing a Perl script to rename huge numbers of symbols in the compiler's reserved namespace to something more sensible so that GCC could even parse the sources. Small was originally developed on Windows, and the author had used a lot of symbols with a leading underscore. Although his symbol names didn't clash with his own compiler, in some cases they were the same as symbols used by GCC.

9.1.4 Function Definitions

As a stylistic convention, the return types for all function definitions should be on a separate line. The main reason for this is that it makes it very easy to find the functions in source file, by looking for a single identifier at the start of a line followed by an open parenthesis:

 
$ egrep '^[_a-zA-Z][_a-zA-Z0-9]*[ \t]*\(' error.c
 set_program_name (const char *path)
 error (int exit_status, const char *mode, const char *message)
 sic_warning (const char *message)
 sic_error (const char *message)
 sic_fatal (const char *message)
 

There are emacs lisp functions and various code analysis tools, such as ansi2knr (see section 9.1.6 K&R Compilers), which rely on this formatting convention, too. Even if you don't use those tools yourself, your fellow developers might like to, so it is a good convention to adopt.

9.1.5 Fallback Function Implementations

Due to the huge number of Unix varieties in common use today, many of the C library functions that you take for granted on your prefered development platform are very likely missing from some of the architectures you would like your code to compile on. Fundamentally there are two ways to cope with this:

  • Use only the few library calls that are available everywhere. In reality this is not actually possible because there are two lowest common denominators with mutually exclusive APIs, one rooted in BSD Unix (`bcopy', `rindex') and the other in SYSV Unix (`memcpy', `strrchr'). The only way to deal with this is to define one API in terms of the other using the preprocessor. The newer POSIX standard deprecates many of the BSD originated calls (with exceptions such as the BSD socket API). Even on non-POSIX platforms, there has been so much cross pollination that often both varieties of a given call may be provided, however you would be wise to write your code using POSIX endorsed calls, and where they are missing, define them in terms of whatever the host platform provides.

    This approach requires a lot of knowledge about various system libraries and standards documents, and can leave you with reams of preprocessor code to handle the differences between APIS. You will also need to perform a lot of checking in `configure.in' to figure out which calls are available. For example, to allow the rest of your code to use the `strcpy' call with impunity, you would need the following code in `configure.in':

     
    AC_CHECK_FUNCS(strcpy bcopy)
     

    And the following preprocessor code in a header file that is seen by every source file:

     
    #if !HAVE_STRCPY
     #  if HAVE_BCOPY
     #    define strcpy(dest, src)   bcopy (src, dest, 1 + strlen (src))
     #  else /* !HAVE_BCOPY */
          error no strcpy or bcopy
     #  endif /* HAVE_BCOPY */
     #endif /* HAVE_STRCPY */
     

  • Alternatively you could provide your own fallback implementations of function calls you know are missing on some platforms. In practice you don't need to be as knowledgable about problematic functions when using this approach. You can look in GNU libiberty(9) or Franзois Pinard's libit project(10) to see for which functions other GNU developers have needed to implement fallback code. The libit project is especially useful in this respect as it comprises canonical versions of fallback functions, and suitable Autoconf macros assembled from across the entire GNU project. I won't give an example of setting up your package to use this approach, since that is how I have chosen to structure the project described in this chapter.

Rather than writing code to the lowest common denominator of system libraries, I am a strong advocate of the latter school of thought in the majority of cases. As with all things it pays to take a pragmatic approach; don't be afraid of the middle ground -- weigh the options on a case by case basis.

9.1.6 K&R Compilers

K&R C is the name now used to describe the original C language specified by Brian Kernighan and Dennis Ritchie (hence, `K&R'). I have yet to see a C compiler that doesn't support code written in the K&R style, yet it has fallen very much into disuse in favor of the newer ANSI C standard. Although it is increasingly common for vendors to unbundle their ANSI C compiler, the GCC project (11) is available for all of the architectures I have ever used.

There are four differences between the two C standards:

  1. ANSI C expects full type specification in function prototypes, such as you might supply in a library header file:

     
    extern int functionname (const char *parameter1, size_t parameter 2);
     

    The nearest equivalent in K&R style C is a forward declaration, which allows you to use a function before its corresponding definition:

     
    extern int functionname ();
     

    As you can imagine, K&R has very bad type safety, and does not perform any checks that only function arguments of the correct type are used.

  2. The function headers of each function definition are written differently. Where you might see the following written in ANSI C:

     
    int
     functionname (const char *parameter1, size_t parameter2)
     {
       ...
     }
     

    K&R expects the parameter type declarations separately, like this:

     
    int
     functionname (parameter1, parameter2)
          const char *parameter1;
          size_t parameter2;
     {
       ...
     }
     

  3. There is no concept of an untyped pointer in K&R C. Where you might be used to seeing `void *' pointers in ANSI code, you are forced to overload the meaning of `char *' for K&R compilers.

  4. Variadic functions are handled with a different API in K&R C, imported with `#include <varargs.h>'. A K&R variadic function definition looks like this:

     
    int
     functionname (va_alist)
          va_dcl
     {
       va_list ap;
       char *arg;
     
       va_start (ap);
       ...
       arg = va_arg (ap, char *);
       ...
       va_end (ap);
     
       return arg ? strlen (arg) : 0;
     }
     

    ANSI C provides a similar API, imported with `#include <stdarg.h>', though it cannot express a variadic function with no named arguments such as the one above. In practice, this isn't a problem since you always need at least one parameter, either to specify the total number of arguments somehow, or else to mark the end of the argument list. An ANSI variadic function definition looks like this:

     
    int
     functionname (char *format, ...)
     {
       va_list ap;
       char *arg;
     
       va_start (ap, format);
       ...
       arg = va_arg (ap, char *);
       ...
       va_end (ap);
     
       return format ? strlen (format) : 0;
     }
     

Except in very rare cases where you are writing a low level project (GCC for example), you probably don't need to worry about K&R compilers too much. However, supporting them can be very easy, and if you are so inclined, can be handled either by employing the ansi2knr program supplied with Automake, or by careful use of the preprocessor.

Using ansi2knr in your project is described in some detail in section `Automatic de-ANSI-fication' in The Automake Manual, but boils down to the following:

  • Add this macro to your `configure.in' file:

     
    AM_C_PROTOTYPES
     

  • Rewrite the contents of `LIBOBJS' and/or `LTLIBOBJS' in the following fashion:

     
    # This is necessary so that .o files in LIBOBJS are also built via
     # the ANSI2KNR-filtering rules.
     Xsed='sed -e "s/^X//"'
     LIBOBJS=`echo X"$LIBOBJS"|\
     	[$Xsed -e 's/\.[^.]* /.\$U& /g;s/\.[^.]*$/.\$U&/']`
     

Personally, I dislike this method, since every source file is filtered and rewritten with ANSI function prototypes and declarations converted to K&R style adding a fair overhead in additional files in your build tree, and in compilation time. This would be reasonable were the abstraction sufficient to allow you to forget about K&R entirely, but ansi2knr is a simple program, and does not address any of the other differences between compilers that I raised above, and it cannot handle macros in your function prototypes of definitions. If you decide to use ansi2knr in your project, you must make the decision before you write any code, and be aware of its limitations as you develop.

For my own projects, I prefer to use a set of preprocessor macros along with a few stylistic conventions so that all of the differences between K&R and ANSI compilers are actually addressed, and so that the unfortunate few who have no access to an ANSI compiler (and who cannot use GCC for some reason) needn't suffer the overheads of ansi2knr.

The four differences in style listed at the beginning of this subsection are addressed as follows:

  1. The function protoype argument lists are declared inside a PARAMS macro invocation so that K&R compilers will still be able to compile the source tree. PARAMS removes ANSI argument lists from function prototypes for K&R compilers. Some developers continue to use __P for this purpose, but strictly speaking, macros starting with `_' (and especially `__') are reserved for the compiler and the system headers, so using `PARAMS', as follows, is safer:

     
    #if __STDC__
     #  ifndef NOPROTOS
     #    define PARAMS(args)      args
     #  endif
     #endif
     #ifndef PARAMS
     #  define PARAMS(args)        ()
     #endif
     

    This macro is then used for all function declarations like this:

     
    extern int functionname PARAMS((const char *parameter));
     

  2. With the PARAMS macro is used for all function declarations, ANSI compilers are given all the type information they require to do full compile time type checking. The function definitions proper must then be declared in K&R style so that K&R compilers don't choke on ANSI syntax. There is a small amount of overhead in writing code this way, however: The ANSI compile time type checking can only work in conjunction with K&R function definitions if it first sees an ANSI function prototype. This forces you to develop the good habit of prototyping every single function in your project. Even the static ones.

  3. The easiest way to work around the lack of void * pointers, is to define a new type that is conditionally set to void * for ANSI compilers, or char * for K&R compilers. You should add the following to a common header file:

     
    #if __STDC__
     typedef void *void_ptr;
     #else /* !__STDC__ */
     typedef char *void_ptr;
     #endif /* __STDC__ */
     

  4. The difference between the two variadic function APIs pose a stickier problem, and the solution is ugly. But it does work. FIrst you must check for the headers in `configure.in':

     
    AC_CHECK_HEADERS(stdarg.h varargs.h, break)
     

    Having done this, add the following code to a common header file:

     
    #if HAVE_STDARG_H
     #  include <stdarg.h>
     #  define VA_START(a, f)        va_start(a, f)
     #else
     #  if HAVE_VARARGS_H
     #    include <varargs.h>
     #    define VA_START(a, f)      va_start(a)
     #  endif
     #endif
     #ifndef VA_START
       error no variadic api
     #endif
     

    You must now supply each variadic function with both a K&R and an ANSI definition, like this:

     
    int
     #if HAVE_STDARG_H
     functionname (const char *format, ...)
     #else
     functionname (format, va_alist)
          const char *format;
          va_dcl
     #endif
     {
       va_alist ap;
       char *arg;
     
       VA_START (ap, format);
       ...
       arg = va_arg (ap, char *);
       ...
       va_end (ap);
     
       return arg : strlen (arg) ? 0;
     }
     

9.2 A Simple Shell Builders Library

An application which most developers try their hand at sooner or later is a Unix shell. There is a lot of functionality common to all traditional command line shells, which I thought I would push into a portable library to get you over the first hurdle when that moment is upon you. Before elabourating on any of this I need to name the project. I've called it sic, from the Latin so it is, because like all good project names it is somewhat pretentious and it lends itself to the recursive acronym sic is cumulative.

The gory detail of the minutae of the source is beyond the scope of this book, but to convey a feel for the need for Sic, some of the goals which influenced the design follow:

  • Sic must be very small so that, in addition to being used as the basis for a full blown shell, it can be linked (unadorned) into an application and used for trivial tasks, such as reading startup configuration.

  • It must not be tied to a particular syntax or set of reserved words. If you use it to read your startup configuration, I don't want to force you to use my syntax and commands.

  • The boundary between the library (`libsic') and the application must be well defined. Sic will take strings of characters as input, and internally parse and evaluate them according to registered commands and syntax, returning results or diagnostics as appropriate.

  • It must be extremely portable -- that is what I am trying to illustrate here, after all.

9.2.1 Portability Infrastructure

As I explained in 9.1.1 Project Directory Structure, I'll first create the project directories, a toplevel dirctory and a subdirectory to put the library sources into. I want to install the library header files to `/usr/local/include/sic', so the library subdirectory must be named appropriately. See section 9.1.2 C Header Files.

 
$ mkdir sic
 $ mkdir sic/sic
 $ cd sic/sic
 

I will describe the files I add in this section in more detail than the project specific sources, because they comprise an infrastructure that I use relatively unchanged for all of my GNU Autotools projects. You could keep an archive of these files, and use them as a starting point each time you begin a new project of your own.

9.2.1.1 Error Management

A good place to start with any project design is the error management facility. In Sic I will use a simple group of functions to display simple error messages. Here is `sic/error.h':

 
#ifndef SIC_ERROR_H
 #define SIC_ERROR_H 1
 
 #include <sic/common.h>
 
 BEGIN_C_DECLS
 
 extern const char *program_name;
 extern void set_program_name (const char *argv0);
 
 extern void sic_warning      (const char *message);
 extern void sic_error        (const char *message);
 extern void sic_fatal        (const char *message);
 
 END_C_DECLS
 
 #endif /* !SIC_ERROR_H */
 
 

This header file follows the principles set out in 9.1.2 C Header Files.

I am storing the program_name variable in the library that uses it, so that I can be sure that the library will build on architectures that don't allow undefined symbols in libraries (12).

Keeping those preprocessor macro definitions designed to aid code portability together (in a single file), is a good way to maintain the readability of the rest of the code. For this project I will put that code in `common.h':

 
#ifndef SIC_COMMON_H
 #define SIC_COMMON_H 1
 
 #if HAVE_CONFIG_H
 #  include <sic/config.h>
 #endif
 
 #include <stdio.h>
 #include <sys/types.h>
 
 #if STDC_HEADERS
 #  include <stdlib.h>
 #  include <string.h>
 #elif HAVE_STRINGS_H
 #  include <strings.h>
 #endif /*STDC_HEADERS*/
 
 #if HAVE_UNISTD_H
 #  include <unistd.h>
 #endif
 
 #if HAVE_ERRNO_H
 #  include <errno.h>
 #endif /*HAVE_ERRNO_H*/
 #ifndef errno
 /* Some systems #define this! */
 extern int errno;
 #endif
 
 #endif /* !SIC_COMMON_H */
 
 

You may recognise some snippets of code from the Autoconf manual here--- in particular the inclusion of the project `config.h', which will be generated shortly. Notice that I have been careful to conditionally include any headers which are not guaranteed to exist on every architecture. The rule of thumb here is that only `stdio.h' is ubiquitous (though I have never heard of a machine that has no `sys/types.h'). You can find more details of some of these in section `Existing Tests' in The GNU Autoconf Manual.

Here is a little more code from `common.h':

 
#ifndef EXIT_SUCCESS
 #  define EXIT_SUCCESS  0
 #  define EXIT_FAILURE  1
 #endif
 
 

The implementation of the error handling functions goes in `error.c' and is very straightforward:

 
#if HAVE_CONFIG_H
 #  include <sic/config.h>
 #endif
 
 #include "common.h"
 #include "error.h"
 
 static void error (int exit_status, const char *mode, 
                    const char *message);
 
 static void
 error (int exit_status, const char *mode, const char *message)
 {
   fprintf (stderr, "%s: %s: %s.\n", program_name, mode, message);
 
   if (exit_status >= 0)
     exit (exit_status);
 }
 
 void
 sic_warning (const char *message)
 {
   error (-1, "warning", message);
 }
 
 void
 sic_error (const char *message)
 {
   error (-1, "ERROR", message);
 }
 
 void
 sic_fatal (const char *message)
 {
   error (EXIT_FAILURE, "FATAL", message);
 }
 
 

I also need a definition of program_name; set_program_name copies the filename component of path into the exported data, program_name. The xstrdup function just calls strdup, but aborts if there is not enough memory to make the copy:

 
const char *program_name = NULL;
 
 void
 set_program_name (const char *path)
 {
   if (!program_name)
     program_name = xstrdup (basename (path));
 }
 

9.2.1.2 Memory Management

A useful idiom common to many GNU projects is to wrap the memory management functions to localise out of memory handling, naming them with an `x' prefix. By doing this, the rest of the project is relieved of having to remember to check for `NULL' returns from the various memory functions. These wrappers use the error API to report memory exhaustion and abort the program. I have placed the implementation code in `xmalloc.c':

 
#if HAVE_CONFIG_H
 #  include <sic/config.h>
 #endif
 
 #include "common.h"
 #include "error.h"
 
 void *
 xmalloc (size_t num)
 {
   void *new = malloc (num);
   if (!new)
     sic_fatal ("Memory exhausted");
   return new;
 }
 
 void *
 xrealloc (void *p, size_t num)
 {
   void *new;
 
   if (!p)
     return xmalloc (num);
 
   new = realloc (p, num);
   if (!new)
     sic_fatal ("Memory exhausted");
 
   return new;
 }
 
 void *
 xcalloc (size_t num, size_t size)
 {
   void *new = xmalloc (num * size);
   bzero (new, num * size);
   return new;
 }
 
 

Notice in the code above, that xcalloc is implemented in terms of xmalloc, since calloc itself is not available in some older C libraries.

Rather than create a separate `xmalloc.h' file, which would need to be #included from almost everywhere else, the logical place to declare these functions is in `common.h', since the wrappers will be called from most everywhere else in the code:

 
#ifdef __cplusplus
 #  define BEGIN_C_DECLS         extern "C" {
 #  define END_C_DECLS           }
 #else
 #  define BEGIN_C_DECLS
 #  define END_C_DECLS
 #endif
 
 #define XCALLOC(type, num)                                  \
         ((type *) xcalloc ((num), sizeof(type)))
 #define XMALLOC(type, num)                                  \
         ((type *) xmalloc ((num) * sizeof(type)))
 #define XREALLOC(type, p, num)                              \
         ((type *) xrealloc ((p), (num) * sizeof(type)))
 #define XFREE(stale)                            do {        \
         if (stale) { free (stale);  stale = 0; }            \
                                                 } while (0)
 
 BEGIN_C_DECLS
 
 extern void *xcalloc    (size_t num, size_t size);
 extern void *xmalloc    (size_t num);
 extern void *xrealloc   (void *p, size_t num);
 extern char *xstrdup    (const char *string);
 extern char *xstrerror  (int errnum);
 
 END_C_DECLS
 
 

By using the macros defined here, allocating and freeing heap memory is reduced from:

 
char **argv = (char **) xmalloc (sizeof (char *) * 3);
 do_stuff (argv);
 if (argv)
   free (argv);
 

to the simpler and more readable:

 
char **argv = XMALLOC (char *, 3);
 do_stuff (argv);
 XFREE (argv);
 

In the same spirit, I have borrowed `xstrdup.c' and `xstrerror.c' from project GNU's libiberty. See section 9.1.5 Fallback Function Implementations.

9.2.1.3 Generalised List Data Type

In many C programs you will see various implementations and re-implementations of lists and stacks, each tied to its own particular project. It is surprisingly simple to write a catch-all implementation, as I have done here with a generalised list operation API in `list.h':

 
#ifndef SIC_LIST_H
 #define SIC_LIST_H 1
 
 #include <sic/common.h>
 
 BEGIN_C_DECLS
 
 typedef struct list {
   struct list *next;    /* chain forward pointer*/
   void *userdata;       /* incase you want to use raw Lists */
 } List;
 
 extern List *list_new       (void *userdata);
 extern List *list_cons      (List *head, List *tail);
 extern List *list_tail      (List *head);
 extern size_t list_length   (List *head);
 
 END_C_DECLS
 
 #endif /* !SIC_LIST_H */
 
 

The trick is to ensure that any structures you want to chain together have their forward pointer in the first field. Having done that, the generic functions declared above can be used to manipulate any such chain by casting it to List * and back again as necessary.

For example:

 
struct foo {
   struct foo *next;
 
   char *bar;
   struct baz *qux;
   ...
 };
 
 ...
   struct foo *foo_list = NULL;
 
   foo_list = (struct foo *) list_cons ((List *) new_foo (),
                                        (List *) foo_list);
 ...
 

The implementation of the list manipulation functions is in `list.c':

 
#include "list.h"
 
 List *
 list_new (void *userdata)
 {
   List *new = XMALLOC (List, 1);
 
   new->next = NULL;
   new->userdata = userdata;
 
   return new;
 }
 
 List *
 list_cons (List *head, List *tail)
 {
   head->next = tail;
   return head;
 }
 
 List *
 list_tail (List *head)
 {
   return head->next;
 }
 
 size_t
 list_length (List *head)
 {
   size_t n;
   
   for (n = 0; head; ++n)
     head = head->next;
 
   return n;
 }
 
 

9.2.2.1 `sic.c' & `sic.h'

Here are the functions for creating and managing sic parsers.

 
#ifndef SIC_SIC_H
 #define SIC_SIC_H 1
 
 #include <sic/common.h>
 #include <sic/error.h>
 #include <sic/list.h>
 #include <sic/syntax.h>
 
 typedef struct sic {
   char *result;                 /* result string */
   size_t len;                   /* bytes used by result field */
   size_t lim;                   /* bytes allocated to result field */
   struct builtintab *builtins;  /* tables of builtin functions */
   SyntaxTable **syntax;         /* dispatch table for syntax of input */
   List *syntax_init;            /* stack of syntax state initialisers */
   List *syntax_finish;          /* stack of syntax state finalizers */
   SicState *state;              /* state data from syntax extensions */
 } Sic;
 
 #endif /* !SIC_SIC_H */
 
 

9.2.2.2 `builtin.c' & `builtin.h'

Here are the functions for managing tables of builtin commands in each Sic structure:

 
typedef int (*builtin_handler) (Sic *sic,
                                 int argc, char *const argv[]);
 
 typedef struct {
   const char *name;
   builtin_handler func;
   int min, max;
 } Builtin;
 
 typedef struct builtintab BuiltinTab;
 
 extern Builtin *builtin_find (Sic *sic, const char *name);
 extern int builtin_install   (Sic *sic, Builtin *table);
 extern int builtin_remove    (Sic *sic, Builtin *table);
 
 

9.2.2.3 `eval.c' & `eval.h'

Having created a Sic parser, and populated it with some Builtin handlers, a user of this library must tokenize and evaluate its input stream. These files define a structure for storing tokenized strings (Tokens), and functions for converting char * strings both to and from this structure type:

 
#ifndef SIC_EVAL_H
 #define SIC_EVAL_H 1
 
 #include <sic/common.h>
 #include <sic/sic.h>
 
 BEGIN_C_DECLS
 
 typedef struct {
   int  argc;            /* number of elements in ARGV */
   char **argv;          /* array of pointers to elements */
   size_t lim;           /* number of bytes allocated */
 } Tokens;
 
 extern int eval       (Sic *sic, Tokens *tokens);
 extern int untokenize (Sic *sic, char **pcommand, Tokens *tokens);
 extern int tokenize   (Sic *sic, Tokens **ptokens, char **pcommand);
 
 END_C_DECLS
 
 #endif /* !SIC_EVAL_H */
 
 

These files also define the eval function, which examines a Tokens structure in the context of the given Sic parser, dispatching the argv array to a relevant Builtin handler, also written by the library user.

9.2.2.4 `syntax.c' & `syntax.h'

When tokenize splits a char * string into parts, by default it breaks the string into words delimited by whitespace. These files define the interface for changing this default behaviour, by registering callback functions which the parser will run when it meets an `interesting' symbol in the input stream. Here are the declarations from `syntax.h':

 
BEGIN_C_DECLS
 
 typedef int SyntaxHandler (struct sic *sic, BufferIn *in,
                            BufferOut *out);
 
 typedef struct syntax {
   SyntaxHandler *handler;
   char *ch;
 } Syntax;
 
 extern int syntax_install (struct sic *sic, Syntax *table);
 extern SyntaxHandler *syntax_handler (struct sic *sic, int ch);
 
 END_C_DECLS
 
 

A SyntaxHandler is a function called by tokenize as it consumes its input to create a Tokens structure; the two functions associate a table of such handlers with a given Sic parser, and find the particular handler for a given character in that Sic parser, respectively.

9.2.3 Beginnings of a `configure.in'

Now that I have some code, I can run autoscan to generate a preliminary
`configure.in'. autoscan will examine all of the sources in the current directory tree looking for common points of non-portability, adding macros suitable for detecting the discovered problems. autoscan generates the following in `configure.scan':

 
# Process this file with autoconf to produce a configure script.
 AC_INIT(sic/eval.h)
 
 # Checks for programs.
 
 # Checks for libraries.
 
 # Checks for header files.
 AC_HEADER_STDC
 AC_CHECK_HEADERS(strings.h unistd.h)
 
 # Checks for typedefs, structures, and compiler characteristics.
 AC_C_CONST
 AC_TYPE_SIZE_T
 
 # Checks for library functions.
 AC_FUNC_VPRINTF
 AC_CHECK_FUNCS(strerror)
 
 AC_OUTPUT()
 

Since the generated `configure.scan' does not overwrite your project's `configure.in', it is a good idea to run autoscan periodically even in established project source trees, and compare the two files. Sometimes autoscan will find some portability issue you have overlooked, or weren't aware of.

Looking through the documentation for the macros in this `configure.scan',
AC_C_CONST and AC_TYPE_SIZE_T will take care of themselves (provided I ensure that `config.h' is included into every source file), and AC_HEADER_STDC and AC_CHECK_HEADERS(unistd.h) are already taken care of in `common.h'.

autoscan is no silver bullet! Even here in this simple example, I need to manually add macros to check for the presence of `errno.h':

 
AC_CHECK_HEADERS(errno.h strings.h unistd.h)
 

I also need to manually add the Autoconf macro for generating `config.h'; a macro to initialise automake support; and a macro to check for the presence of ranlib. These should go close to the start of `configure.in':

 
...
 AC_CONFIG_HEADER(config.h)
 AM_INIT_AUTOMAKE(sic, 0.5)
 
 AC_PROG_CC
 AC_PROG_RANLIB
 ...
 

An interesting macro suggested by autoscan is AC_CHECK_FUNCS(strerror). This tells me that I need to provide a replacement implementation of strerror for the benefit of architectures which don't have it in their system libraries. This is resolved by providing a file with a fallback implementation for the named function, and creating a library from it and any others that `configure' discovers to be lacking from the system library on the target host.

You will recall that `configure' is the shell script the end user of this package will run on their machine to test that it has all the features the package wants to use. The library that is created will allow the rest of the project to be written in the knowledge that any functions required by the project but missing from the installers system libraries will be available nonetheless. GNU `libiberty' comes to the rescue again -- it already has an implementation of `strerror.c' that I was able to use with a little modification.

Being able to supply a simple implementation of strerror, as the `strerror.c' file from `libiberty' does, relies on there being a well defined sys_errlist variable. It is a fair bet that if the target host has no strerror implementation, however, that the system sys_errlist will be broken or missing. I need to write a configure macro to check whether the system defines sys_errlist, and tailor the code in `strerror.c' to use this knowledge.

To avoid clutter in the top-level directory, I am a great believer in keeping as many of the configuration files as possible in their own sub-directory. First of all, I will create a new directory called `config' inside the top-level directory, and put `sys_errlist.m4' inside it:

 
AC_DEFUN(SIC_VAR_SYS_ERRLIST,
 [AC_CACHE_CHECK([for sys_errlist],
 sic_cv_var_sys_errlist,
 [AC_TRY_LINK([int *p;], [extern int sys_errlist; p = &sys_errlist;],
             sic_cv_var_sys_errlist=yes, sic_cv_var_sys_errlist=no)])
 if test x"$sic_cv_var_sys_errlist" = xyes; then
   AC_DEFINE(HAVE_SYS_ERRLIST, 1,
     [Define if your system libraries have a sys_errlist variable.])
 fi])
 
 

I must then add a call to this new macro in the `configure.in' file being careful to put it in the right place -- somwhere between typedefs and structures and library functions according to the comments in `configure.scan':

 
SIC_VAR_SYS_ERRLIST
 

GNU Autotools can also be set to store most of their files in a subdirectory, by calling the AC_CONFIG_AUX_DIR macro near the top of `configure.in', preferably right after AC_INIT:

 
AC_INIT(sic/eval.c)
 AC_CONFIG_AUX_DIR(config)
 AM_CONFIG_HEADER(config.h)
 ...
 

Having made this change, many of the files added by running autoconf and automake --add-missing will be put in the aux_dir.

The source tree now looks like this:

 
sic/
   +-- configure.scan
   +-- config/
   |     +-- sys_errlist.m4
   +-- replace/
   |     +-- strerror.c
   +-- sic/
         +-- builtin.c
         +-- builtin.h
         +-- common.h
         +-- error.c
         +-- error.h
         +-- eval.c
         +-- eval.h
         +-- list.c
         +-- list.h
         +-- sic.c
         +-- sic.h
         +-- syntax.c
         +-- syntax.h
         +-- xmalloc.c
         +-- xstrdup.c
         +-- xstrerror.c
 

In order to correctly utilise the fallback implementation, AC_CHECK_FUNCS(strerror) needs to be removed and strerror added to AC_REPLACE_FUNCS:

 
# Checks for library functions.
 AC_REPLACE_FUNCS(strerror)
 

This will be clearer if you look at the `Makefile.am' for the `replace' subdirectory:

 
## Makefile.am -- Process this file with automake to produce Makefile.in
 
 INCLUDES                =  -I$(top_builddir) -I$(top_srcdir)
 
 noinst_LIBRARIES        = libreplace.a
 libreplace_a_SOURCES        = 
 libreplace_a_LIBADD        = @LIBOBJS@
 
 

The code tells automake that I want to build a library for use within the build tree (i.e. not installed -- `noinst'), and that has no source files by default. The clever part here is that when someone comes to install Sic, they will run configure which will test for strerror, and add `strerror.o' to LIBOBJS if the target host environment is missing its own implementation. Now, when `configure' creates `replace/Makefile' (as I asked it to with AC_OUTPUT), `@LIBOBJS@' is replaced by the list of objects required on the installer's machine.

Having done all this at configure time, when my user runs make, the files required to replace functions missing from their target machine will be added to `libreplace.a'.

Unfortunately this is not quite enough to start building the project. First I need to add a top-level `Makefile.am' from which to ultimately create a top-level `Makefile' that will descend into the various subdirectories of the project:

 
## Makefile.am -- Process this file with automake to produce Makefile.in
 
 SUBDIRS = replace sic
 

And `configure.in' must be told where it can find instances of Makefile.in:

 
AC_OUTPUT(Makefile replace/Makefile sic/Makefile)
 

I have written a bootstrap script for Sic, for details see 8. Bootstrapping:

 
#! /bin/sh
 
 set -x
 aclocal -I config
 autoheader
 automake --foreign --add-missing --copy
 autoconf
 
 

The `--foreign' option to automake tells it to relax the GNU standards for various files that should be present in a GNU distribution. Using this option saves me from havng to create empty files as we did in 5. A Minimal GNU Autotools Project.

Right. Let's build the library! First, I'll run bootstrap:

 
$ ./bootstrap
 + aclocal -I config
 + autoheader
 + automake --foreign --add-missing --copy
 automake: configure.in: installing config/install-sh
 automake: configure.in: installing config/mkinstalldirs
 automake: configure.in: installing config/missing
 + autoconf
 

The project is now in the same state that an end-user would see, having unpacked a distribution tarball. What follows is what an end user might expect to see when building from that tarball:

 
$ ./configure
 creating cache ./config.cache
 checking for a BSD compatible install... /usr/bin/install -c
 checking whether build environment is sane... yes
 checking whether make sets ${MAKE}... yes
 checking for working aclocal... found
 checking for working autoconf... found
 checking for working automake... found
 checking for working autoheader... found
 checking for working makeinfo... found
 checking for gcc... gcc
 checking whether the C compiler (gcc  ) works... yes
 checking whether the C compiler (gcc  ) is a cross-compiler... no
 checking whether we are using GNU C... yes
 checking whether gcc accepts -g... yes
 checking for ranlib... ranlib
 checking how to run the C preprocessor... gcc -E
 checking for ANSI C header files... yes
 checking for unistd.h... yes
 checking for errno.h... yes
 checking for string.h... yes
 checking for working const... yes
 checking for size_t... yes
 checking for strerror... yes
 updating cache ./config.cache
 creating ./config.status
 creating Makefile
 creating replace/Makefile
 creating sic/Makefile
 creating config.h
 

Compare this output with the contents of `configure.in', and notice how each macro is ultimately responsible for one or more consecutive tests (via the Bourne shell code generated in `configure'). Now that the `Makefile's have been successfully created, it is safe to call make to perform the actual compilation:

 
$ make
 make  all-recursive
 make[1]: Entering directory `/tmp/sic'
 Making all in replace
 make[2]: Entering directory `/tmp/sic/replace'
 rm -f libreplace.a
 ar cru libreplace.a
 ranlib libreplace.a
 make[2]: Leaving directory `/tmp/sic/replace'
 Making all in sic
 make[2]: Entering directory `/tmp/sic/sic'
 gcc -DHAVE_CONFIG_H -I. -I. -I.. -I..    -g -O2 -c builtin.c
 gcc -DHAVE_CONFIG_H -I. -I. -I.. -I..    -g -O2 -c error.c
 gcc -DHAVE_CONFIG_H -I. -I. -I.. -I..    -g -O2 -c eval.c
 gcc -DHAVE_CONFIG_H -I. -I. -I.. -I..    -g -O2 -c list.c
 gcc -DHAVE_CONFIG_H -I. -I. -I.. -I..    -g -O2 -c sic.c
 gcc -DHAVE_CONFIG_H -I. -I. -I.. -I..    -g -O2 -c syntax.c
 gcc -DHAVE_CONFIG_H -I. -I. -I.. -I..    -g -O2 -c xmalloc.c
 gcc -DHAVE_CONFIG_H -I. -I. -I.. -I..    -g -O2 -c xstrdup.c
 gcc -DHAVE_CONFIG_H -I. -I. -I.. -I..    -g -O2 -c xstrerror.c
 rm -f libsic.a
 ar cru libsic.a builtin.o error.o eval.o list.o sic.o syntax.o xmalloc.o
 xstrdup.o xstrerror.o
 ranlib libsic.a
 make[2]: Leaving directory `/tmp/sic/sic'
 make[1]: Leaving directory `/tmp/sic'
 

On this machine, as you can see from the output of configure above, I have no need of the fallback implementation of strerror, so `libreplace.a' is empty. On another machine this might not be the case. In any event, I now have a compiled `libsic.a' -- so far, so good.

9.3 A Sample Shell Application

What I need now, is a program that uses `libsic.a', if only to give me confidence that it is working. In this section, I will write a simple shell which uses the library. But first, I'll create a directory to put it in:

 
$ mkdir src
 $ ls -F
 COPYING  Makefile.am  aclocal.m4  configure*    config/   sic/
 INSTALL  Makefile.in  bootstrap*  configure.in  replace/  src/
 $ cd src
 

In order to put this shell together, we need to provide just a few things for integration with `libsic.a'...

9.3.1 `sic_repl.c'

In `sic_repl.c'(13) there is a loop for reading strings typed by the user, evaluating them and printing the results. GNU readline is ideally suited to this, but it is not always available -- or sometimes people simply may not wish to use it.

With the help of GNU Autotools, it is very easy to cater for building with and without GNU readline. `sic_repl.c' uses this function to read lines of input from the user:

 
static char *
 getline (FILE *in, const char *prompt)
 {
   static char *buf = NULL;        /* Always allocated and freed
                                    from inside this function.  */
   XFREE (buf);
 
   buf = (char *) readline ((char *) prompt);
 
 #ifdef HAVE_ADD_HISTORY
   if (buf && *buf)
     add_history (buf);
 #endif
   
   return buf;
 }
 
 

To make this work, I must write an Autoconf macro which adds an option to `configure', so that when the package is installed, it will use the readline library if `--with-readline' is used:

 
AC_DEFUN(SIC_WITH_READLINE,
 [AC_ARG_WITH(readline,
 [  --with-readline         compile with the system readline library],
 [if test x"${withval-no}" != xno; then
   sic_save_LIBS=$LIBS
   AC_CHECK_LIB(readline, readline)
   if test x"${ac_cv_lib_readline_readline}" = xno; then
     AC_MSG_ERROR(libreadline not found)
   fi
   LIBS=$sic_save_LIBS
 fi])
 AM_CONDITIONAL(WITH_READLINE, test x"${with_readline-no}" != xno)
 ])
 
 

Having put this macro in the file `config/readline.m4', I must also call the new macro (SIC_WITH_READLINE) from `configure.in'.

9.3.2 `sic_syntax.c'

The syntax of the commands in the shell I am writing is defined by a set of syntax handlers which are loaded into `libsic' at startup. I can get the C preprocessor to do most of the repetitive code for me, and just fill in the function bodies:

 
#if HAVE_CONFIG_H
 #  include <sic/config.h>
 #endif
 
 #include "sic.h"
 
 /* List of builtin syntax. */
 #define syntax_functions                \
         SYNTAX(escape,  "\\")           \
         SYNTAX(space,   " \f\n\r\t\v")  \
         SYNTAX(comment, "#")            \
         SYNTAX(string,  "\"")           \
         SYNTAX(endcmd,  ";")            \
         SYNTAX(endstr,  "")
 
 /* Prototype Generator. */
 #define SIC_SYNTAX(name)                \
         int name (Sic *sic, BufferIn *in, BufferOut *out)
 
 #define SYNTAX(name, string)            \
         extern SIC_SYNTAX (CONC (syntax_, name));
 syntax_functions
 #undef SYNTAX
 
 /* Syntax handler mappings. */
 Syntax syntax_table[] = {
 
 #define SYNTAX(name, string)            \
         { CONC (syntax_, name), string },
   syntax_functions
 #undef SYNTAX
   
   { NULL, NULL }
 };
 
 

This code writes the prototypes for the syntax handler functions, and creates a table which associates each with one or more characters that might occur in the input stream. The advantage of writing the code this way is that when I want to add a new syntax handler later, it is a simple matter of adding a new row to the syntax_functions macro, and writing the function itself.

9.3.3 `sic_builtin.c'

In addition to the syntax handlers I have just added to the Sic shell, the language of this shell is also defined by the builtin commands it provides. The infrastructure for this file is built from a table of functions which is fed into various C preprocessor macros, just as I did for the syntax handlers.

One builtin handler function has special status, builtin_unknown. This is the builtin that is called, if the Sic library cannot find a suitable builtin function to handle the current input command. At first this doesn't sound especially important -- but it is the key to any shell implementation. When there is no builtin handler for the command, the shell will search the users command path, `$PATH', to find a suitable executable. And this is the job of builtin_unknown:

 
int
 builtin_unknown (Sic *sic, int argc, char *const argv[])
 {
   char *path = path_find (argv[0]);
   int status = SIC_ERROR;
 
   if (!path)
     {
       sic_result_append (sic, "command \"");
       sic_result_append (sic, argv[0]);
       sic_result_append (sic, "\" not found");
     }
   else if (path_execute (sic, path, argv) != SIC_OKAY)
     {
       sic_result_append (sic, "command \"");
       sic_result_append (sic, argv[0]);
       sic_result_append (sic, "\" failed: ");
       sic_result_append (sic, strerror (errno));
     }
   else
     status = SIC_OKAY;
 
   return status;
 }
 
 static char *
 path_find (const char *command)
 {
   char *path = xstrdup (command);
   
   if (*command == '/')
     {
       if (access (command, X_OK) < 0)
         goto notfound;
     }
   else
     {
       char *PATH = getenv ("PATH");
       char *pbeg, *pend;
       size_t len;
 
       for (pbeg = PATH; *pbeg != '\0'; pbeg = pend)
         {
           pbeg += strspn (pbeg, ":");
           len = strcspn (pbeg, ":");
           pend = pbeg + len;
           path = XREALLOC (char, path, 2 + len + strlen(command));
           *path = '\0';
           strncat (path, pbeg, len);
           if (path[len -1] != '/') strcat (path, "/");
           strcat (path, command);
           
           if (access (path, X_OK) == 0)
               break;
         }
 
       if (*pbeg == '\0')
           goto notfound;
     }
 
   return path;
 
  notfound:
   XFREE (path);
   return NULL;
 }  
 
 
 

Running `autoscan' again at this point adds AC_CHECK_FUNCS(strcspn strspn) to `configure.scan'. This tells me that these functions are not truly portable. As before I provide fallback implementations for these functions incase they are missing from the target host -- and as it turns out, they are easy to write:

 
/* strcspn.c -- implement strcspn() for architectures without it */
 
 #if HAVE_CONFIG_H
 #  include <sic/config.h>
 #endif
 
 #include <sys/types.h>
 
 #if STDC_HEADERS
 #  include <string.h>
 #elif HAVE_STRINGS_H
 #  include <strings.h>
 #endif
 
 #if !HAVE_STRCHR
 #  ifndef strchr
 #    define strchr index
 #  endif
 #endif
 
 size_t
 strcspn (const char *string, const char *reject)
 {
   size_t count = 0;
   while (strchr (reject, *string) == 0)
     ++count, ++string;
 
   return count;
 }
 
 

There is no need to add any code to `Makefile.am', because the configure script will automatically add the names of the missing function sources to `@LIBOBJS@'.

This implementation uses the autoconf generated `config.h' to get information about the availability of headers and type definitions. It is interesting that autoscan reports that strchr and strrchr, which are used in the fallback implementations of strcspn and strspn respectively, are themselves not portable! Luckily, the Autoconf manual tells me exactly how to deal with this: by adding some code to my `common.h' (paraphrased from the literal code in the manual):

 
#if !STDC_HEADERS
 #  if !HAVE_STRCHR
 #    define strchr index
 #    define strrchr rindex
 #  endif
 #endif
 

And another macro in `configure.in':

9.3.4 `sic.c' & `sic.h'

Since the application binary has no installed header files, there is little point in maintaining a corresponding header file for every source, all of the structures shared by these files, and non-static functions in these files are declared in `sic.h':

 
#ifndef SIC_H
 #define SIC_H 1
 
 #include <sic/common.h>
 #include <sic/sic.h>
 #include <sic/builtin.h>
 
 BEGIN_C_DECLS
 
 extern Syntax syntax_table[];
 extern Builtin builtin_table[];
 extern Syntax syntax_table[];
 
 extern int evalstream    (Sic *sic, FILE *stream);
 extern int evalline      (Sic *sic, char **pline);
 extern int source        (Sic *sic, const char *path);
 extern int syntax_init   (Sic *sic);
 extern int syntax_finish (Sic *sic, BufferIn *in, BufferOut *out);
 
 END_C_DECLS
 
 #endif /* !SIC_H */
 
 

To hold together everything you have seen so far, the main function creates a Sic parser and initialises it by adding syntax handler functions and builtin functions from the two tables defined earlier, before handing control to evalstream which will eventually exit when the input stream is exhausted.

 
int
 main (int argc, char * const argv[])
 {
   int result = EXIT_SUCCESS;
   Sic *sic = sic_new ();
   
   /* initialise the system */
   if (sic_init (sic) != SIC_OKAY)
       sic_fatal ("sic initialisation failed");
   signal (SIGINT, SIG_IGN);
   setbuf (stdout, NULL);
 
   /* initial symbols */
   sicstate_set (sic, "PS1", "] ", NULL);
   sicstate_set (sic, "PS2", "- ", NULL);
   
   /* evaluate the input stream */
   evalstream (sic, stdin);
 
   exit (result);
 }
 
 

Now, the shell can be built and used:

 
$ bootstrap
 ...
 $ ./configure --with-readline
 ...
 $ make
 ...
 make[2]: Entering directory `/tmp/sic/src'
 gcc -DHAVE_CONFIG_H -I. -I.. -I../sic -I.. -I../sic -g -c sic.c
 gcc -DHAVE_CONFIG_H -I. -I.. -I../sic -I.. -I../sic -g -c sic_builtin.c
 gcc -DHAVE_CONFIG_H -I. -I.. -I../sic -I.. -I../sic -g -c sic_repl.c
 gcc -DHAVE_CONFIG_H -I. -I.. -I../sic -I.. -I../sic -g -c sic_syntax.c
 gcc  -g -O2  -o sic  sic.o sic_builtin.o sic_repl.o sic_syntax.o \
 ../sic/libsic.a ../replace/libreplace.a -lreadline
 make[2]: Leaving directory `/tmp/sic/src'
 ...
 $ ./src/sic
 ] pwd
 /tmp/sic
 ] ls -F
 Makefile     aclocal.m4   config.cache    configure*    sic/
 Makefile.am  bootstrap*   config.log      configure.in  src/
 Makefile.in  config/      config.status*  replace/
 ] exit
 $
 

This chapter has developed a solid foundation of code, which I will return to in 12. A Large GNU Autotools Project, when Libtool will join the fray. The chapters leading up to that explain what Libtool is for, how to use it and integrate it into your own projects, and the advantages it offers over building shared libraries with Automake (or even just Make) alone.

Оставьте свой комментарий !

Ваше имя:
Комментарий:
Оба поля являются обязательными

 Автор  Комментарий к данной статье