Autoconf, Automake, and Libtool
Авторы :
4. Introducing `Makefile's
A `Makefile' is a specification of dependencies between files and
how to resolve those dependencies such that an overall goal, known as a
target, can be reached. `Makefile's are processed by the
make utility. Other references describe the syntax of
`Makefile's and the various implementations of make in
detail. This chapter provides an overview into `Makefile's and
gives just enough information to write custom rules in a
`Makefile.am'
4.1 Targets and dependencies
The make program attempts to bring a target up to date by
bring all of the target's dependencies up to date. These dependencies
may have further dependencies. Thus, a potentially complex dependency
graph forms when processing a typical `Makefile'. From a simple
`Makefile' that looks like this:
|
all: foo
foo: foo.o bar.o baz.o
.c.o:
$(CC) $(CFLAGS) -c $< -o $@
.l.c:
$(LEX) $< && mv lex.yy.c $@
|
We can draw a dependency graph that looks like this:
|
all
|
foo
|
.-------+-------.
/ | \
foo.o bar.o baz.o
| | |
foo.c bar.c baz.c
|
baz.l
|
Unless the `Makefile' contains a directive to make , all
targets are assumed to be filename and rules must be written to create
these files or somehow bring them up to date.
When leaf nodes are found in the dependency graph, the `Makefile'
must include a set of shell commands to bring the dependent up to date
with the dependency. Much to the chagrin of many make users,
up to date means the dependent has a more recent timestamp than
the target. Moreover, each of these shell commands are run in their own
sub-shell and, unless the `Makefile' instructs make
otherwise, each command must exit with an exit code of 0 to indicate
success.
Target rules can be written which are executed unconditionally. This is
achieved by specifying that the target has no dependents. A simple rule
which should be familiar to most users is:
4.2 Makefile syntax
`Makefile's have a rather particular syntax that can trouble new
users. There are many implementations of make , some of which
provide non-portable extensions. An abridged description of the syntax
follows which, for portability, may be stricter than you may be used to.
Comments start with a `#' and continue until the end of line. They
may appear anywhere except in command sequences--if they do, they will
be interpreted by the shell running the command. The following
`Makefile' shows three individual targets with dependencies on
each:
|
target1: dep1 dep2 ... depN
<tab> cmd1
<tab> cmd2
<tab> ...
<tab> cmdN
target2: dep4 dep5
<tab> cmd1
<tab> cmd2
dep4 dep5:
<tab> cmd1
|
Target rules start at the beginning of a line and are followed by a
colon. Following the colon is a whitespace separated list of
dependencies. A series of lines follow which contain shell commands
to be run by a sub-shell (the default is the Bourne shell). Each of
these lines must be prefixed by a horizontal tab character.
This is the most common mistake made by new make users.
These commands may be prefixed by an `@' character to prevent
make from echoing the command line prior to executing it. They
may also optionally be prefixed by a `-' character to allow the
rule to continue if the command returns a non-zero exit code. The
combination of both characters is permitted.
4.3 Macros
A number of useful macros exist which may be used anywhere throughout
the `Makefile'. Macros start with a dollar sign, like shell
variables. Our first `Makefile' used a few:
| $(CC) $(CFLAGS) -c $< -o $@
|
Here, syntactic forms of `$(..)' are make variable
expansions. It is possible to define a make variable using a
`var=value' syntax:
In a `Makefile', $(CC) will then be literally replaced by
`ec++'. make has a number of built-in variables and
default values. The default value for `$(CC)' is cc.
Other built-in macros exist with fixed semantics. The two most common
macros are $@ and $< . They represent the names of the
target and the first dependency for the rule in which they appear.
$@ is available in any rule, but for some versions of
make $< is only available in suffix rules. Here is a
simple `Makefile':
|
all: dummy
@echo "$@ depends on dummy"
dummy:
touch $@
|
This is what make outputs when processing this
`Makefile':
|
$ make
touch dummy
all depends on dummy
|
The GNU Make manual documents these macros in more detail.
4.4 Suffix rules
To simplify a `Makefile', there is a special kind of rule syntax
known as a suffix rule. This is a wildcard pattern that can
match targets. Our first `Makefile' used some. Here is one:
|
.c.o:
$(CC) $(CFLAGS) -c $< -o $@
|
Unless a more specific rule matches the target being sought, this rule
will match any target that ends in `.o'. These files are said to
always be dependent on `.c'. With some background material now
presented, let's take a look at these tools in use.
5. A Minimal GNU Autotools Project
This chapter describes how to manage a minimal project using the
GNU Autotools. A minimal project is defined to be the smallest possible
project that can still illustrate a sufficient number of principles in
using the tools. By studying a smaller project, it becomes easier to
understand the more complex interactions between these tools when larger
projects require advanced features.
The example project used throughout this chapter is a fictitious command
interpreter called foonly . foonly is written in C,
but like many interpreters, uses a lexical analyzer and a parser
expressed using the lex and yacc tools. The package
will be developed to adhere to the GNU `Makefile' standard,
which is the default behavior for Automake.
There are many features of the GNU Autotools that this small project will
not utilize. The most noteworthy one is libraries; this package does
not produce any libraries of its own, so Libtool will not feature in
this chapter. The more complex projects presented in 9. A Small GNU Autotools Project and 12. A Large GNU Autotools Project will illustrate
how Libtool participates in the build system. The purpose of this
chapter will be to provide a high-level overview of the user-written
files and how they interact.
5.1 User-Provided Input Files
The smallest project requires the user to provide only two files. The
remainder of the files needed to build the package are generated by the
GNU Autotools (see section 5.2 Generated Output Files).
- `Makefile.am' is an input to
automake .
- `configure.in' is an input to
autoconf .
I like to think of `Makefile.am' as a high-level, bare-bones
specification of a project's build requirements: what needs to be built,
and where does it go when it is installed? This is probably Automake's
greatest strength--the description is about as simple as it could
possibly be, yet the final product is a `Makefile' with an array of
convenient make targets.
The `configure.in' is a template of macro invocations and shell
code fragments that are used by autoconf to produce a
`configure' script (see section C. Generated File Dependencies).
autoconf copies the contents of `configure.in' to
`configure', expanding macros as they occur in the input. Other
text is copied verbatim.
Let's take a look at the contents of the user-provided input files that
are relevant to this minimal project. Here is the `Makefile.am':
|
bin_PROGRAMS = foonly
foonly_SOURCES = main.c foo.c foo.h nly.c scanner.l parser.y
foonly_LDADD = @LEXLIB@
|
This `Makefile.am' specifies that we want a program called
`foonly' to be built and installed in the `bin' directory when
make install is run. The source files that are used to build
`foonly' are the C source files `main.c', `foo.c',
`nly.c' and `foo.h', the lex program in
`scanner.l' and a yacc grammar in `parser.y'. This
points out a particularly nice aspect about Automake: because
lex and yacc both generate intermediate C programs
from their input files, Automake knows how to build such intermediate
files and link them into the final executable. Finally, we must
remember to link a suitable lex library, if `configure'
concludes that one is needed.
And here is the `configure.in':
|
dnl Process this file with autoconf to produce a configure script.
AC_INIT(main.c)
AM_INIT_AUTOMAKE(foonly, 1.0)
AC_PROG_CC
AM_PROG_LEX
AC_PROG_YACC
AC_OUTPUT(Makefile)
|
This `configure.in' invokes some mandatory Autoconf and Automake
initialization macros, and then calls on some Autoconf macros from the
AC_PROG family to find suitable C compiler, lex , and
yacc programs. Finally, the AC_OUTPUT macro is used to
cause the generated `configure' script to output a
`Makefile'---but from what? It is processed from
`Makefile.in', which Automake produces for you based on your
`Makefile.am' (see section 5.2 C. Generated File Dependencies).
5.2 Generated Output Files
By studying the diagram in C. Generated File Dependencies, it should
be possible to see which commands must be run to generate the required
output files from the input files shown in the last section.
First, we generate `configure':
Because `configure.in' contains macro invocations which are not
known to autoconf itself--AM_INIT_AUTOMAKE being a case in
point, it is necessary to collect all of the macro definitions for
autoconf to use when generating `configure'. This is done using
the aclocal program, so called because it generates
`aclocal.m4' (see section C. Generated File Dependencies). If you were to
examine the contents of `aclocal.m4', you would find the definition
of the AM_INIT_AUTOMAKE macro contained within.
After running autoconf , you will find a `configure'
script in the current directory. It is important to run aclocal
first because automake relies on the contents of
`configure.in' and `aclocal.m4'. On to automake :
|
$ automake --add-missing
automake: configure.in: installing ./install-sh
automake: configure.in: installing ./mkinstalldirs
automake: configure.in: installing ./missing
automake: Makefile.am: installing ./INSTALL
automake: Makefile.am: required file ./NEWS not found
automake: Makefile.am: required file ./README not found
automake: Makefile.am: installing ./COPYING
automake: Makefile.am: required file ./AUTHORS not found
automake: Makefile.am: required file ./ChangeLog not found
|
The `--add-missing' option copies some boilerplate files from
your Automake installation into the current directory. Files such as
`COPYING', which contain the GNU General Public License change
infrequently, and so can be generated without user intervention. A
number of utility scripts are also installed--these are used by the
generated `Makefile's, particularly by the install target.
Notice that some required files are still missing. These are:
- `NEWS'
- A record of user-visible changes to a package. The format is not
strict, but the changes to the most recent version should appear at the
top of the file.
- `README'
- The first place a user will look to get an overview for the purpose of a
package, and perhaps special installation instructions.
- `AUTHORS'
- Lists the names, and usually mail addresses, of individuals who worked
on the package.
- `ChangeLog'
- The ChangeLog is an important file--it records the changes that are made
to a package. The format of this file is quite strict
(see section 5.5 Documentation and ChangeLogs).
For now, we'll do enough to placate Automake:
|
$ touch NEWS README AUTHORS ChangeLog
$ automake --add-missing
|
Automake has now produced a `Makefile.in'. At this point, you may
wish to take a snapshot of this directory before we really let loose
with automatically generated files.
By now, the contents of the directory will be looking fairly complete
and reminiscent of the top-level directory of a GNU package you may
have installed in the past:
|
AUTHORS INSTALL NEWS install-sh mkinstalldirs
COPYING Makefile.am README configure missing
ChangeLog Makefile.in aclocal.m4 configure.in
|
It should now be possible to package up your tree in a tar file
and give it to other users for them to install on their own systems.
One of the make targets that Automake generates in
`Makefile.in' makes it easy to generate distributions
. A user would merely have to
unpack the tar file, run configure (see section 3. How to run configure and make) and finally type make all :
|
$ ./configure
creating cache ./config.cache
checking for a BSD compatible install... /usr/bin/install -c
checking whether build environment is sane... yes
checking whether make sets ${MAKE}... yes
checking for working aclocal... found
checking for working autoconf... found
checking for working automake... found
checking for working autoheader... found
checking for working makeinfo... found
checking for gcc... gcc
checking whether the C compiler (gcc ) works... yes
checking whether the C compiler (gcc ) is a cross-compiler... no
checking whether we are using GNU C... yes
checking whether gcc accepts -g... yes
checking how to run the C preprocessor... gcc -E
checking for flex... flex
checking for flex... (cached) flex
checking for yywrap in -lfl... yes
checking lex output file root... lex.yy
checking whether yytext is a pointer... yes
checking for bison... bison -y
updating cache ./config.cache
creating ./config.status
creating Makefile
$ make all
gcc -DPACKAGE=\"foonly\" -DVERSION=\"1.0\" -DYYTEXT_POINTER=1 -I. -I. \
-g -O2 -c main.c
gcc -DPACKAGE=\"foonly\" -DVERSION=\"1.0\" -DYYTEXT_POINTER=1 -I. -I. \
-g -O2 -c foo.c
flex scanner.l && mv lex.yy.c scanner.c
gcc -DPACKAGE=\"foonly\" -DVERSION=\"1.0\" -DYYTEXT_POINTER=1 -I. -I. \
-g -O2 -c scanner.c
bison -y parser.y && mv y.tab.c parser.c
if test -f y.tab.h; then \
if cmp -s y.tab.h parser.h; then rm -f y.tab.h; \
else mv y.tab.h parser.h; fi; \
else :; fi
gcc -DPACKAGE=\"foonly\" -DVERSION=\"1.0\" -DYYTEXT_POINTER=1 -I. -I. \
-g -O2 -c parser.c
gcc -g -O2 -o foonly main.o foo.o scanner.o parser.o -lfl
|
5.3 Maintaining Input Files
If you edit any of the GNU Autotools input files in your package, it is
necessary to regenerate the machine generated files for these changes to
take effect. For instance, if you add a new source file to the
foonly_SOURCES variable in `Makefile.am'. It is necessary
to re-generate the derived file `Makefile.in'. If you are building
your package, you need to re-run configure to re-generate the
site-specific `Makefile', and then re-run make to compile
the new source file and link it into `foonly'.
It is possible to regenerate these files by running the required tools,
one at a time. However, as we can see above, it can be difficult to
compute the dependencies--does a particular change require
aclocal to be run? Does a particular change require
autoconf to be run? There are two solutions to this problem.
The first solution is to use the autoreconf command. This
tool regenerates all derived files by re-running all of the necessary
tools in the correct order. It is somewhat of a brute force solution,
but it works very well, particularly if you are not trying to accommodate
other maintainers, or regular maintenance that would render this command
bothersome.
The alternative is Automake's `maintainer mode'. By invoking the
AM_MAINTAINER_MODE macro from `configure.in', automake will
activate an `--enable-maintainer-mode' option in
`configure'. This is explained at length in 8. Bootstrapping.
5.4 Packaging Generated Files
The debate about what to do with generated files is one which is keenly
contested on the relevant Internet mailing lists. There are two points
of view and I will present both of them to you so that you can try to
decide what the best policy is for your project.
One argument is that generated files should not be included with a
package, but rather only the `preferred form' of the source code
should be included. By this definition, `configure' is a derived
file, just like an object file, and it should not be included in the
package. Thus, the user should use the GNU Autotools to bootstrap
themselves prior to building the package. I believe there is some merit
to this purist approach, as it discourages the practice of packaging
derived files.
The other argument is that the advantages of providing these files can
far outweigh the violation of good software engineering practice
mentioned above. By including the generated files, users have the
convenience of not needing to be concerned with keeping up to date with
all of the different versions of the tools in active use. This is
especially true for Autoconf, as `configure' scripts are often
generated by maintainers using locally modified versions of
autoconf and locally installed macros. If `configure'
were regenerated by the user, the result could be different to that
intended. Of course, this is poor practice, but it happens to reflect
reality.
I believe the answer is to include generated files in the package when
the package is going to be distributed to a wide user community (ie. the
general public). For in-house packages, the former argument might make
more sense, since the tools may also be held under version control.
5.5 Documentation and ChangeLogs
As with any software project, it is important to maintain documentation
as the project evolves--the documentation must reflect the current state
of the software, but it must also accurately record the changes that
have been made in the past. The GNU coding standard rigorously
enforces the maintenance of documentation. Automake, in fact,
implements some of the standard by checking for the presence of a
`ChangeLog' file when automake is run!
A number of files exist, with standardized filenames, for storing
documentation in GNU packages. The complete GNU coding
standard, which offers some useful insights, can be found at
http://www.gnu.org/prep/standards.html.
Other projects, including in-house projects, can use these same
tried-and-true techniques. The purpose of most of the standard
documentation files was outlined earlier See section 5.2 Generated Output Files,
but the `ChangeLog' deserves additional treatment.
When recording changes in a `ChangeLog', one entry is made per
person. Logical changes are grouped together, while logically distinct
changes (ie. `change sets') are separated by a single blank line.
Here is an example from Automake's own `ChangeLog':
|
1999-11-21 Tom Tromey <tromey@cygnus.com>
* automake.in (finish_languages): Only generate suffix rule
when not doing dependency tracking.
* m4/init.m4 (AM_INIT_AUTOMAKE): Use AM_MISSING_INSTALL_SH.
* m4/missing.m4 (AM_MISSING_INSTALL_SH): New macro.
* depend2.am: Use @SOURCE@, @OBJ@, @LTOBJ@, @OBJOBJ@,
and @BASE@. Always use -o.
|
Another important point to make about `ChangeLog' entries is that
they should be brief. It is not necessary for an entry to explain in
details why a change was made, but rather what the change
was. If a change is not straightforward then the explanation of
why belongs in the source code itself. The GNU coding
standard offers the complete set of guidelines for keeping
`ChangeLog's. Although any text editor can be used to create
ChangeLog entries, Emacs provides a major mode to help you write them.
6. Writing `configure.in'
Writing a portable `configure.in' is a tricky business. Since you
can put arbitrary shell code into `configure.in', your options seem
overwhelming. There are many questions the first-time Autoconf user
asks: What constructs are portable and what constructs aren't portable?
How do I decide what to check for? What shouldn't I check for? How do
I best use Autoconf's features? What shouldn't I put in
`configure.in'? In what order should I run my checks? When should
I look at the name of the system instead of checking for specific
features?
6.1 What is Portability?
Before we talk about the mechanics of deciding what to check for and how
to check for it, let's ask ourselves a simple question: what is
portability? Portability is a quality of the code that enables it to be
built and run on a variety of platforms. In the Autoconf context,
portability usually refers to the ability to run on Unix-like
systems--sometimes including Windows.
When I first started using Autoconf, I had a hard time deciding what to
check for in my `configure.in'. At the time, I was maintaining a
proprietary program that ran only on SunOS 4. However, I was interested
in porting it to Solaris, OSF/1, and possibly Irix.
The approach I took, while workable, was relatively time-consuming and
painful: I wrote a minimal `configure.in' and then proceeded to
simply try to build my program on Solaris. Each time I encountered a
build problem, I updated `configure.in' and my source and started
again. Once it built correctly, I started testing to see if there were
runtime problems related to portability.
Since I didn't start with a relatively portable base, and since I was
unaware of the tools available to help with adding Autoconf support to a
package (see section 24. Migrating an Existing Package to GNU Autotools), it was much more
difficult than it had to be. If at all possible, it is better to write
portable code to begin with.
There are a large number of Unix-like systems in the world, including
many systems which, while still running, can only be considered
obsolete. While it is probably possible to port some programs to all
such systems, typically it isn't useful to even try. Porting to
everything is a difficult process, especially given that it usually
isn't possible to test on all platforms, and that new operating systems,
with their own bugs and idiosyncracies are released every year.
We advocate a pragmatic approach to portability: we write our programs
to target a fairly large, but also fairly modern, cross-section of
Unix-like systems. As deficiencies are discovered in our portability
framework, we update `configure.in' and our sources, and move on.
In practice, this is an effective approach.
6.2 Brief introduction to portable sh
If you read a number of `configure.in's, you'll quickly notice that
they tend to be written in an unusual style. For instance, you'll
notice you hardly ever see the `[' program used; instead you'll see
`test' invoked. We won't go into all the details of writing a
portable shell script here; instead we leave that for 22. Writing Portable Bourne Shell.
Like other aspects of portability, the approach you take to writing
shell scripts in `configure.in' and `Makefile.am' should
depend on your goals. Some platforms have notoriously broken
sh implementations. For instance, Ultrix sh doesn't
implement unset . Of course, the GNU Autotools are written in the
most portable style possible, so as not to limit your possibilities.
Also, it doesn't really make sense to talk about portable sh
programming in the abstract. sh by itself does very little; most
actual work is done by separate programs, each with its own potential
portability problems. For instance, some options are not portable
between systems, and some seemingly common programs don't exist on every
system -- so not only do you have to know which sh constructs are
not portable, but you also must know which programs you can (and cannot)
use, and which options to those programs are portable.
This seems daunting, but in practice it doesn't seem to be too hard to
write portable shell scripts -- once you've internalized the rules.
Unfortunately, this process can take a long time. Meanwhile, a
pragmatic `try and see' approach, while noting other portable code
you've seen elsewhere, works fairly well. Once again, it pays to be
aware of which architectures you'll probably care about -- you will make
different choices if you are writing an extremely portable program like
emacs or gcc than if you are writing something that will
only run on various flavors of Linux. Also, the cost of having
unportable code in `configure.in' is relatively low -- in general
it is fairly easy to rewrite pieces on demand as unportable constructs
are found.
6.3 Ordering Tests
In addition to the problem of writing portable sh code, another
problem which confronts first-time `configure.in' writers is
determining the order in which to run the various tests. Autoconf
indirectly (via the autoscan program, which we cover in
24. Migrating an Existing Package to GNU Autotools) suggests a standard ordering, which
is what we describe here.
The standard ordering is:
-
Boilerplate. This section should include standard boilerplate code,
such as the call to
AC_INIT (which must be first),
AM_INIT_AUTOMAKE , AC_CONFIG_HEADER , and perhaps
AC_REVISION .
-
Options. The next section should include macros which add command-line
options to
configure , such as AC_ARG_ENABLE . It is
typical to put support code for the option in this section as well, if
it is short enough, like this example from libgcj :
|
AC_ARG_ENABLE(getenv-properties,
[ --disable-getenv-properties
don't set system properties from GCJ_PROPERTIES])
dnl Whether GCJ_PROPERTIES is used depends on the target.
if test -n "$enable_getenv_properties"; then
enable_getenv_properties=${enable_getenv_properties_default-yes}
fi
if test "$enable_getenv_properties" = no; then
AC_DEFINE(DISABLE_GETENV_PROPERTIES)
fi
|
-
Programs. Next it is traditional to check for programs that are either
needed by the configure process, the build process, or by one of the
programs being built. This usually involves calls to macros like
AC_CHECK_PROG and AC_PATH_TOOL .
-
Libraries. Checks for libraries come before checks for other objects
visible to C (or C++, or anything else). This is necessary because some
other checks work by trying to link or run a program; by checking for
libraries first you ensure that the resulting programs can be linked.
-
Headers. Next come checks for existence of headers.
-
Typedefs and structures. We do checks for typedefs after checking for
headers for the simple reason that typedefs appear in headers, and we
need to know which headers we can use before we look inside them.
-
Functions. Finally we check for functions. These come last because
functions have dependencies on the preceding items: when searching for
functions, libraries are needed in order to correctly link, headers are
needed in order to find prototypes (this is especially important for
C++, which has stricter prototyping rules than C), and typedefs are
needed for those functions which use or return types which are not built
in.
-
Output. This is done by invoking
AC_OUTPUT .
This ordering should be considered a rough guideline, and not a list of
hard-and-fast rules. Sometimes it is necessary to interleave tests,
either to make `configure.in' easier to maintain, or because the
tests themselves do need to be in a different order. For instance, if
your project uses both C and C++ you might choose to do all the C++
checks after all the C checks are done, in order to make
`configure.in' a bit easier to read.
6.4 What to check for
Deciding what to check for is really the central part of writing
`configure.in'. Once you've read the Autoconf reference manual,
the "how"s of writing a particular test should be fairly clear. The
"when"s might remain a mystery -- and it's just as easy to check for too
many things as it is to check for too few.
One notable area of divergence between various Unix-like systems is that
the same programs don't exist on all systems, and, even when they do,
they don't always work in the same way. For these problems we
recommend, when possible, following the advice of the GNU Coding
Standards: use the most common options from a relatively limited set of
programs. Failing that, try to stick to programs and options specified
by POSIX, perhaps augmenting this approach by doing checks for known
problems on platforms you care about.
Checking for tools and their differences is usually a fairly small part
of a `configure' script; more common are checks for functions,
libraries, and the like.
Except for a few core libraries like `libc' and, usually,
`libm' and libraries like `libX11' which typically aren't
considered system libraries, there isn't much agreement about library
names or contents between Unix systems. Still, libraries are easy to
handle, because decisions about libraries almost always only affect the
various `Makefile's. That means that checking for another library
typically doesn't require major (or even, sometimes, any) changes to the
source code. Also, because adding a new library test has a small impact
on the development cycle -- effectively just re-running `configure'
and then a relink -- you can effectively adopt a lax approach to
libraries. For instance, you can just make things work on the few
systems you immediately care about and then handle library changes on an
as-needed basis.
Suppose you do end up with a link problem. How do you handle it? The
first thing to do is use nm to look through the system libraries
to see if the missing function exists. If it does, and it is in a
library you can use then the solution is easy -- just add another
AC_CHECK_LIB . Note that just finding the function in a library
is not enough, because on some systems, some "standard" libraries are
undesirable; `libucb' is the most common example of a library which
you should avoid.
If you can't find the function in a system library then you have a
somewhat more difficult problem: a non-portable function. There are
basically three approaches to a missing function. Below we talk about
functions, but really these same approaches apply, more or less, to
typedefs, structures, and global variables.
The first approach is to write a replacement function and either
conditionally compile it, or put it into an appropriately-named file and
use AC_REPLACE_FUNCS . For instance, Tcl uses
AC_REPLACE_FUNCS(strstr) to handle systems that have no
strstr function.
The second approach is used when there is a similar function with a
different name. The idea here is to check for all the alternatives and
then modify your source to use whichever one might exist. The idiom
here is to use break in the second argument to
AC_CHECK_FUNCS ; this is used both to skip unnecessary tests and
to indicate to the reader that these checks are related. For instance,
here is how libgcj checks for inet_aton or
inet_addr ; it only uses the first one found:
|
AC_CHECK_FUNCS(inet_aton inet_addr, break)
|
Code to use the results of these checks looks something like:
|
#if HAVE_INET_ATON
... use inet_aton here
#else
#if HAVE_INET_ADDR
... use inet_addr here
#else
#error Function missing!
#endif
#endif
|
Note how we've made it a compile-time error if the function does not
exist. In general it is best to make errors occur as early as possible
in the build process.
The third approach to non-portable functions is to write code such that
these functions are only optionally used. For instance, if you are
writing an editor you might decide to use mmap to map a file into
the editor's memory. However, since mmap is not portable, you
would also write a function to use the more portable read .
Handling known non-portable functions is only part of the problem,
however. The pragmatic approach works fairly well, but it is somewhat
inefficient if you are primarily developing on a more modern system,
like GNU/Linux, which has few functions missing. In this case the
problem is that you might not notice non-portable constructs in your
code until it has largely been finished.
Unfortunately, there's no high road to solving this problem. In the
end, you need to have a working knowledge of the range of existing Unix
systems. Knowledge of standards such as POSIX and XPG can be useful
here, as a first cut -- if it isn't in POSIX, you should at least
consider checking for it. However, standards are not a panacea -- not
all systems are POSIX compliant, and sometimes there are bugs in systems
functions which you must work around.
One final class of problems you might encounter is that it is also easy
to check for too much. This is bad because it adds unnecessary
maintenance burden to your program. For instance, sometimes you'll see
code that checks for <sys/types.h> . However, there's no point in
doing that -- using this header is mostly portable. Again, this can
only be addressed by having a practical knowledge, which is only really
possible by examining your target systems.
6.5 Using Configuration Names
While feature tests are definitely the best approach, a `configure'
script may occasionally have to make a decision based on a configuration
name. This may be necessary if certain code must be compiled
differently based on something which can not be tested using a standard
Autoconf feature test. For instance, the expect package needs to
find information about the system's `tty' implementation; this
can't reliably be done when cross compiling without examining the
particular configuration name.
It is normally better to test for particular features, rather than to
test for a particular system type. This is because as Unix and other
operating systems evolve, different systems copy features from one
another.
When there is no alternative to testing the configuration name in a
`configure' script, it is best to define a macro which describes
the feature, rather than defining a macro which describes the particular
system. This permits the same macro to be used on other systems
which adopt the same feature (see section 23. Writing New Macros for Autoconf).
Testing for a particular system is normally done using a case statement
in the autoconf `configure.in' file. The case statement
might look something like the following, assuming that `host' is a
shell variable holding a canonical configuration system--which will be
the case if `configure.in' uses the `AC_CANONICAL_HOST' or
`AC_CANONICAL_SYSTEM' macros.
| case "${host}" in
i[[3456]]86-*-linux-gnu*) do something ;;
sparc*-sun-solaris2.[[56789]]*) do something ;;
sparc*-sun-solaris*) do something ;;
mips*-*-elf*) do something ;;
esac
|
Note the doubled square brackets in this piece of code. These are used
to work around an ugly implementation detail of autoconf ---it
uses M4 under the hood. Without these extra brackets, the square
brackets in the case statement would be swallowed by M4, and
would not appear in the resulting `configure'. This nasty detail
is discussed at more length in 21. M4.
It is particularly important to use `*' after the operating system
field, in order to match the version number which will be generated by
`config.guess'. In most cases you must be careful to match a range
of processor types. For most processor families, a trailing `*'
suffices, as in `mips*' above. For the i386 family, something
along the lines of `i[34567]86' suffices at present. For the m68k
family, you will need something like `m68*'. Of course, if you do
not need to match on the processor, it is simpler to just replace the
entire field by a `*', as in `*-*-irix*'.
7. Introducing GNU Automake
The primary goal of Automake is to generate `Makefile.in's
compliant with the GNU Makefile Standards. Along the way, it tries
to remove boilerplate and drudgery. It also helps the `Makefile'
writer by implementing features (for instance automatic dependency
tracking and parallel make support) that most maintainers don't
have the patience to implement by hand. It also implements some best
practices as well as workarounds for vendor make bugs -- both of
which require arcane knowledge not generally available.
A secondary goal for Automake is that it work well with other free
software, and, specifically, GNU tools. For example, Automake has
support for Dejagnu-based test suites.
Chances are that you don't care about the GNU Coding Standards.
That's okay. You'll still appreciate the convenience that Automake
provides, and you'll find that the GNU standards compliance
feature, for the most part, assists rather than impedes.
Automake helps the maintainer with five large tasks, and countless minor
ones. The basic functional areas are:
-
Build
-
Check
-
Clean
-
Install and uninstall
-
Distribution
We cover the first three items in this chapter, and the others in later
chapters. Before we get into the details, let's talk a bit about some
general principles of Automake.
7.1 General Automake principles
Automake at its simplest turns a file called `Makefile.am' into a
GNU-compliant `Makefile.in' for use with `configure'. Each
`Makefile.am' is written according to make syntax; Automake
recognizes special macro and target names and generates code based on
these.
There are a few Automake rules which differ slightly from make
rules:
-
Ordinary
make comments are passed through to the output, but
comments beginning with `##' are Automake comments and are not
passed through.
-
Automake supports
include directives. These directives are not
passed through to the `Makefile.in', but instead are processed by
automake -- files included this way are treated as if they
were textually included in `Makefile.am' at that point. This can
be used to add boilerplate to each `Makefile.am' in a project via a
centrally-maintained file. The filename to include can start with
`$(top_srcdir)' to indicate that it should be found relative to the
top-most directory of the project; if it is a relative path or if it
starts with `$(srcdir)' then it is relative to the current
directory. For example, here is how you would reference boilerplate
code from the file `config/Make-rules' (where `config' is a
top-level dirctory in the project):
|
include $(top_srcdir)/config/Make-rules
|
-
Automake supports conditionals which are not passed directly through to
`Makefile.in'. This feature is discussed in 19. Advanced GNU Automake Usage.
-
Automake supports macro assignment using `+='; these assignments
are translated by Automake into ordinary `=' assignments in
`Makefile.in'.
All macros and targets, including those which Automake does not
recognize, are passed through to the generated `Makefile.in' --
this is a powerful extension mechanism. Sometimes Automake will define
macros or targets internally. If these are also defined in
`Makefile.am' then the definition in `Makefile.am' takes
precedence. This feature provides an easy way to tailor specific parts
of the output in small ways.
Note, however, that it is a mistake to override parts of the generated
code that aren't documented (and thus `exported' by Automake).
Overrides like this stand a good chance of not working with future
Automake releases.
Automake also scans `configure.in'. Sometimes it uses the
information it discovers to generate extra code, and sometimes to
provide extra error checking. Automake also turns every AC_SUBST
into a `Makefile' variable. This is convenient in more ways than
one: not only does it mean that you can refer to these macros in
`Makefile.am' without extra work, but, since Automake scans
`configure.in' before it reads any `Makefile.am', it also
means that special variables and overrides Automake recognizes can be
defined once in `configure.in'.
7.2 Introduction to Primaries
Each type of object that Automake understands has a special root
variable name associated with it. This root is called a primary.
Many actual variable names put into `Makefile.am' are constructed
by adding various prefixes to a primary.
For instance, scripts--interpreted executable programs--are associated
with the SCRIPTS primary. Here is how you would list scripts to
be installed in the user's `bindir':
|
bin_SCRIPTS = magic-script
|
(Note that the mysterious `bin_' prefix will be discussed later.)
The contents of a primary-derived variable are treated as targets in the
resulting `Makefile'. For instance, in our example above, we could
generate `magic-script' using sed by simply introducing it
as a target:
|
bin_SCRIPTS = magic-script
magic-script: magic-script.in
sed -e 's/whatever//' < $(srcdir)/magic-script.in > magic-script
chmod +x magic-script
|
7.3 The easy primaries
This section describes the common primaries that are relatively easy to
understand; the more complicated ones are discussed in the next section.
DATA
- This is the easiest primary to understand. A macro of this type lists a
number of files which are installed verbatim. These files can appear
either in the source directory or the build directory.
HEADERS
- Macros of this type list header files. These are separate from
DATA macros because this allows for extra error checking in some
cases.
SCRIPTS
- This is used for executable scripts (interpreted programs). These are
different from
DATA because they are installed with different
permissions and because they have the program name transform applied to
them (e.g., the `--program-transform-name' argument to
configure ). Scripts are also different from compiled programs
because the latter can be stripped while scripts cannot.
MANS
- This lists man pages. Installing man pages is more complicated than you
might think due to the lack of a single common practice. One developer
might name a man page in the source tree `foo.man' and then rename
to the real name (`foo.1') at install time. Another developer
might instead use numeric suffixes in the source tree and install using
the same name. Sometimes an alphabetic code follows the numeric suffix
(e.g., `quux.3n'); this code must be stripped before determining
the correct install directory (this file must still be installed in
`$(man3dir)'). Automake supports all of these modes of operation:
-
man_MANS can be used when numeric suffixes are already in place:
|
man_MANS = foo.1 bar.2 quux.3n
|
-
man1_MANS , man2_MANS , etc., can be used to force renaming
at install time. This renaming is skipped if the suffix already begins
with the correct number. For instance:
|
man1_MANS = foo.man
man3_MANS = quux.3n
| Here `foo.man' will be installed as `foo.1' but `quux.3n'
will keep its name at install time.
TEXINFOS
- GNU programs traditionally use the Texinfo documentation format,
not man pages. Automake has full support for Texinfo, including some
additional features such as versioning and
install-info support.
We won't go into that here except to mention that it exists. See the
Automake reference manual for more information.
Automake supports a variety of lesser-used primaries such as JAVA
and LISP (and, in the next major release, PYTHON ). See
the reference manual for more information on these.
7.4 Programs and libraries
The preceding primaries have all been relatively easy to use. Now we'll
discuss a more complicated set, namely those used to build programs and
libraries. These primaries are more complex because building a program
is more complex than building a script (which often doesn't even need
building at all).
Use the PROGRAMS primary for programs, LIBRARIES for
libraries, and LTLIBRARIES for Libtool libraries
(see section 10. Introducing GNU Libtool). Here is a minimal example:
This creates the program doit and arranges to install it in
bindir . First make will compile `doit.c' to produce
`doit.o'. Then it will link `doit.o' to create `doit'.
Of course, if you have more than one source file, and most programs do,
then you will want to be able to list them somehow. You will do this
via the program's SOURCES variable. Each program or library has
a set of associated variables whose names are constructed by appending
suffixes to the `normalized' name of the program. The normalized
name is the name of the object with non-alphanumeric characters changed
to underscores. For instance, the normalized name of `quux' is
`quux', but the normalized name of `install-info' is
`install_info'. Normalized names are used because they correspond
to make syntax, and, like all macros, Automake propagates these
definitions into the resulting `Makefile.in'.
So if `doit' is to be built from files `main.c' and
`doit.c', we would write:
|
bin_PROGRAMS = doit
doit_SOURCES = doit.c main.c
|
The same holds for libraries. In the zlib package we might make a
library called `libzlib.a'. Then we would write:
|
lib_LIBRARIES = libzlib.a
libzlib_a_SOURCES = adler32.c compress.c crc32.c deflate.c deflate.h \
gzio.c infblock.c infblock.h infcodes.c infcodes.h inffast.c inffast.h \
inffixed.h inflate.c inftrees.c inftrees.h infutil.c infutil.h trees.c \
trees.h uncompr.c zconf.h zlib.h zutil.c zutil.h
|
We can also do this with libtool libraries. For instance, suppose we
want to build `libzlib.la' instead:
|
lib_LTLIBRARIES = libzlib.la
libzlib_la_SOURCES = adler32.c compress.c crc32.c deflate.c deflate.h \
gzio.c infblock.c infblock.h infcodes.c infcodes.h inffast.c inffast.h \
inffixed.h inflate.c inftrees.c inftrees.h infutil.c infutil.h trees.c \
trees.h uncompr.c zconf.h zlib.h zutil.c zutil.h
|
As you can see, making shared libraries with Automake and Libtool is
just as easy as making static libraries.
In the above example, we listed header files in the SOURCES
variable. These are ignored (except by make dist
(1)) but can serve to make
your `Makefile.am' a bit clearer (and sometimes shorter, if you
aren't installing headers).
Note that you can't use `configure' substitutions in a
SOURCES variable. Automake needs to know the static list
of files which can be compiled into your program. There are still
various ways to conditionally compile files, for instance Automake
conditionals or the use of the LDADD variable.
The static list of files is also used in some versions of Automake's
automatic dependency tracking. The general rule is that each source
file which might be compiled should be listed in some SOURCES
variable. If the source is conditionally compiled, it can be listed in
an EXTRA variable. For instance, suppose in this example
`@FOO_OBJ@' is conditionally set by `configure' to
`foo.o' when `foo.c' should be compiled:
|
bin_PROGRAMS = foo
foo_SOURCES = main.c
foo_LDADD = @FOO_OBJ@
foo_DEPENDENCIES = @FOO_OBJ@
EXTRA_foo_SOURCES = foo.c
|
In this case, `EXTRA_foo_SOURCES' is used to list sources which are
conditionally compiled; this tells Automake that they exist even though
it can't deduce their existence automatically.
In the above example, note the use of the `foo_LDADD' macro. This
macro is used to list other object files and libraries which should be
linked into the foo program. Each program or library has several
such associated macros which can be used to customize the link step;
here we list the most common ones:
- `_DEPENDENCIES'
- Extra dependencies which are added to the program's dependency list. If
not specified, this is automatically computed based on the value of the
program's `_LDADD' macro.
- `_LDADD'
- Extra objects which are passed to the linker. This is only used by
programs and shared libraries.
- `_LDFLAGS'
- Flags which are passed to the linker. This is separate from
`_LDADD' to allow `_DEPENDENCIES' to be auto-computed.
- `_LIBADD'
- Like `_LDADD', but used for static libraries and not programs.
You aren't required to define any of these macros.
7.5 Frequently Asked Questions
Experience has shown that there are several common questions that arise
as people begin to use automake for their own projects. It seemed
prudent to mention these issues here.
Users often want to make a library (or program, but for some reason it
comes up more frequently with libraries) whose sources live in
subdirectories:
|
lib_LIBRARIES = libsub.a
libsub_a_SOURCES = subdir1/something.c ...
|
If you try this with Automake 1.4, you'll get an error:
|
$ automake
automake: Makefile.am: not supported: source file subdir1/something.c is in subdirectory
|
For libraries, this problem is mostly simply solve by using libtool
convenience libraries. For programs, there is no simple solution. Many
people elect to restructure their package in this case.
The next major release of Automake addresses this problem.
Another general problem that comes up is that of setting compilation
flags. Most rules have flags--for instance, compilation of C code
automatically uses `CFLAGS'. However, these variables are
considered user variables. Setting them in `Makefile.am' is
unsafe, because the user will expect to be able to override them at
will.
To handle this, for each flag variable, Automake introduce an `AM_'
version which can be set in `Makefile.am'. For instance, we could
set some flags for C and C++ compilation like so:
|
AM_CFLAGS = -DFOR_C
AM_CXXFLAGS = -DFOR_CXX
|
Finally, people often ask how to compile a single source file in two
different ways. For instance, the `etags.c' file which comes with
Emacs can be compiled with different `-D' options to produce the
etags and ctags programs.
With Automake 1.4 this can only be done by writing your own compilation
rules, like this:
|
bin_PROGRAMS = etags ctags
etags_SOURCES = etags.c
ctags_SOURCES =
ctags_LDADD = ctags.o
etags.o: etags.c
$(CC) $(CFLAGS) -DETAGS ...
ctags.o: etags.c
$(CC) $(CFLAGS) -DCTAGS ...
|
This is tedious and hard to maintain for larger programs. Automake 1.5
will support a much more natural approach:
|
bin_PROGRAMS = etags ctags
etags_SOURCES = etags.c
etags_CFLAGS = -DETAGS
ctags_SOURCES = etags.c
ctags_CFLAGS = -DCTAGS
|
7.6 Multiple directories
So far, we've only dealt with single-directory projects. Automake can
also handle projects with many directories. The variable `SUBDIRS'
is used to list the subdirectories which should be built. Here is an
example from Automake itself:
Automake does not need to know the list of subdirectories statically, so
there is no `EXTRA_SUBDIRS' variable. You might think that
Automake would use `SUBDIRS' to see which `Makefile.am's to
scan, but it actually gets this information from `configure.in'.
This means that, if you have a subdirectory which is optionally built,
you should still list it unconditionally in your call to
AC_OUTPUT and then arrange for it to be substituted (or not, as
appropriate) at configure time.
Subdirectories are always built in the order they appear, but cleaning
rules (e.g., maintainer-clean ) are always run in the reverse
order. The reason for this odd reversal is that it is wrong to remove a
file before removing all the files which depend on it.
You can put `.' into `SUBDIRS' to control when the objects in
the current directory are built, relative to the objects in the
subdirectories. In the example above, targets in `.' will be built
before subdirectories are built. If `.' does not appear in
`SUBDIRS', it is built following all the subdirectories.
7.7 Testing
Automake also includes simple support for testing your program.
The most simple form of this is the `TESTS' variable. This
variable holds a list of tests which are run when the user runs
make check . Each test is built (if necessary) and then executed.
For each test, make prints a single line indicating whether the
test has passed or failed. Failure means exiting with a non-zero
status, with the special exception that an exit status of `77'
(2) means that the test should be ignored. make check also prints a
summary showing the number of passes and fails.
Automake also supports the notion of an xfail, which is a test
which is expected to fail. Sometimes this is useful when you want to
track a known failure, but you aren't prepared to fix it right away.
Tests which are expected to fail should be listed in both `TESTS'
and `XFAIL_TESTS'.
The special prefix `check' can be used with primaries to indicate
that the objects should only be built at make check time. For
example, here is how you can build a program that will only be used
during the testing process:
|
check_PROGRAMS = test-program
test_program_SOURCES = ...
|
Automake also supports the use of DejaGNU, the GNU test framework.
DejaGNU support can be enabled using the `dejagnu' option:
|
AUTOMAKE_OPTIONS = dejagnu
|
The resulting `Makefile.in' will include code to invoke the
runtest program appropriately.
8. Bootstrapping
There are many programs in the GNU Autotools, each of which has a complex
set of inputs. When one of these inputs changes, it is important to run
the proper programs in the proper order. Unfortunately, it is hard to
remember both the dependencies and the ordering.
For instance, whenever you edit `configure.in', you must remember
to re-run aclocal in case you added a reference to a new macro.
You must also rebuild `configure' by running autoconf ;
`config.h' by running autoheader , in case you added a new
AC_DEFINE ; and automake to propagate any new
AC_SUBST s to the various `Makefile.in's. If you edit a
`Makefile.am', you must re-run automake . In both these
cases, you must then remember to re-run config.status --recheck
if `configure' changed, followed by config.status to rebuild
the `Makefile's.
When doing active development on the build system for your project,
these dependencies quickly become painful. Of course, Automake knows
how to handle this automatically. By default, automake
generates a `Makefile.in' which knows all these dependencies and
which automatically re-runs the appropriate tools in the appropriate
order. These rules assume that the correct versions of the tools are
all in your PATH .
It helps to have a script ready to do all of this for you once, before
you have generated a `Makefile' that will automatically run the
tools in the correct order, or when you make a fresh checkout of the
code from a CVS repository where the developers don't keep
generated files under source control. There are at least two opposing
schools of thought regarding how to go about this -- the
autogen.sh school and the bootstrap school:
autogen.sh
- From the outset, this is a poor name for a bootstrap script, since there
is already a GNU automatic text generation tool called AutoGen.
Often packages that follow this convention have the script automatically
run the generated
configure script after the boostrap process,
passing autogen.sh arguments through to configure .
Except you don't knon what options you want yet, since you can't run
`configure --help' until configure has been generated. I
suggest that if you find yourself compiling a project set up in this way
that you type:
|
$ /bin/sh ./autogen.sh --help
|
and ignore the spurious warning that tells you configure will
be executed.
bootstrap
- Increasingly, projects are starting to call their bootstrap scripts
`bootstrap'. Such scripts simply run the various commands required
to bring the source tree into a state where the end user can simply:
|
$ configure
$ make
$ make install
|
Unfortunately, proponents of this school of thought don't put the
bootstrap script in their distributed tarballs, since the script is
unnecessary except when the build environment of a developer's machine
has changed. This means the proponents of the autogen.sh school may
never see the advantages of the other method.
Autoconf comes with a program called autoreconf which essentially
does the work of the bootstrap script. autoreconf is
rarely used because, historically, has not been very well known, and only
in Autoconf 2.13 did it acquire the ability to work with Automake.
Unfortunately, even the Autoconf 2.13 autoreconf does not handle
libtoolize and some automake -related options that are
frequently nice to use.
We recommend the bootstrap method, until autoreconf is
fixed. At this point bootstrap has not been standardized, so
here is a version of the script we used while writing this book
(3):
|
#! /bin/sh
aclocal \
&& automake --gnu --add-missing \
&& autoconf
|
We don't use autoreconf here because that script (as of Autoconf
2.13) also does not handle the `--add-missing' option, which we
want. A typical bootstrap might also run libtoolize or
autoheader .
It is also important for all developers on a project to have the same
versions of the tools installed so that these rules don't inadvertantly
cause problems due to differences between tool versions. This version
skew problem turns out to be fairly significant in the field. So,
automake provides a way to disable these rules by default,
while still allowing users to enable them when they know their
environment is set up correctly.
In order to enable this mode, you must first add
AM_MAINTAINER_MODE to `configure.in'. This will add the
`--enable-maintainer-mode' option to `configure'; when
specified this flag will cause these so-called `maintainer rules' to
be enabled.
Note that maintainer mode is a controversial feature. Some people like
to use it because it causes fewer bug reports in some situations. For
instance, CVS does not preserve relatively timestamps on
files. If your project has both `configure.in' and
`configure' checked in, and maintainer mode is not in use, then
sometimes make will decide to rebuild `configure' even
though it is not really required. This in turn means more headaches for
your developers -- on a large project most developers won't touch
`configure.in' and many may not even want to install the GNU Autotools
(4).
The other camp claims that end users should use the same build system
that developers use, that maintainer mode is simply unaesthetic, and
furthermore that the modality of maintainer mode is dangerous--you can
easily forget what mode you are in and thus forget to rebuild, and thus
correctly test, a change to the configure or build system. When
maintainer mode is not in use, the Automake-supplied missing
script will be used to warn users when it appears that they need a
maintainer tool that they do not have.
9. A Small GNU Autotools Project
This chapter introduces a small--but real--worked example, to
illustrate some of the features, and highlight some of the pitfalls, of
the GNU Autotools discussed so far. All of the source can be downloaded
from the book's web
page (5).
The text is peppered with my own pet ideas, accumulated over a several
years of working with the GNU Autotools and you should be able to easily
apply these to your own projects. I will begin by describing some of
the choices and problems I encountered during the early stages of the
development of this project. Then by way of illustration of the issues
covered, move on to showing you a general infrastructure that I use as
the basis for all of my own projects, followed by the specifics of the
implementation of a portable command line shell library. This chapter
then finishes with a sample shell application that uses that library.
9.1.1 Project Directory Structure
Before starting to write code for any project, you need to decide on
the directory structure you will use to organise the code. I like to
build each component of a project in its own subdirectory, and to keep
the configuration sources separate from the source code. The great
majority of GNU projects I have seen use a similar method, so
adopting it yourself will likely make your project more familiar to your
developers by association.
The top level directory is used for configuration files, such as
`configure' and `aclocal.m4', and for a few other sundry
files, `README' and a copy of the project license for example.
Any significant libraries will have a subdirectory of their own,
containing all of the sources and headers for that library along with a
`Makefile.am' and anything else that is specific to just that
library. Libraries that are part of a small like group, a set of
pluggable application modules for example, are kept together in a single
directory.
The sources and headers for the project's main application will be
stored in yet another subdirectory, traditionally named `src'. There
are other conventional directories your developers might expect too: A
`doc' directory for project documentation; and a `test'
directory for the project self test suite.
To keep the project top-level directory as uncluttered as possible, as I
like to do, you can take advantage of Autoconf's
`AC_CONFIG_AUX_DIR' by creating another durectory, say
`config', which will be used to store many of the GNU Autotools
intermediate files, such as install-sh . I always store all
project specific Autoconf M4 macros to this same subdirectory.
So, this is what you should start with:
|
$ pwd
~/mypackage
$ ls -F
Makefile.am config/ configure.in lib/ test/
README configure* doc/ src/
|
9.1.2 C Header Files
There is a small amount of boiler-plate that should be added to all
header files, not least of which is a small amount of code to prevent
the contents of the header from being scanned multiple times. This is
achieved by enclosing the entire file in a preprocessor conditional
which evaluates to false after the first time it has been seen by the
preprocessor. Traditionally, the macro used is in all upper case, and
named after the installation path without the installation prefix.
Imagine a header that will be intalled to
`/usr/local/include/sys/foo.h', for example. The preprocessor
code would be as follows:
|
#ifndef SYS_FOO_H
#define SYS_FOO_H 1
...
#endif /* !SYS_FOO_H */
|
Apart from comments, the entire content of the rest of this header file
must be between these few lines. It is worth mentioning that inside the
enclosing ifndef , the macro SYS_FOO_H must be defined
before any other files are #include d. It is a common mistake to
not define that macro until the end of the file, but mutual dependency
cycles are only stalled if the guard macro is defined before the
#include which starts that cycle(6).
If a header is designed to be installed, it must #include other
installed project headers from the local tree using angle-brackets.
There are some implications to working like this:
-
You must be careful that the names of header file directories in the
source tree match the names of the directories in the install tree. For
example, when I plan to install the aforementioned `foo.h' to
`/usr/local/include/project/foo.h', from which it will be included
using `#include <project/foo.h>', then in order for the same
include line to work in the source tree, I must name the source
directory it is installed from `project' too, or other headers which
use it will not be able to find it until after it has been installed.
-
When you come to developing the next version of a project laid out in
this way, you must be careful about finding the correct header.
Automake takes care of that for you by using `-I' options that
force the compiler to look for uninstalled headers in the current source
directory before searching the system directories for installed headers
of the same name.
-
You don't have to install all of your headers to `/usr/include' --
you can use subdirectories. And all without having to rewrite the
headers at install time.
9.1.3 C++ Compilers
In order for a C++ program to use a library compiled with a C compiler,
it is neccessary for any symbols exported from the C library to be
declared between `extern "C" {' and `}'. This code is
important, because a C++ compiler mangles(7) all variable and function names, where
as a C compiler does not. On the other hand, a C compiler will not
understand these lines, so you must be careful to make them invisible
to the C compiler.
Sometimes you will see this method used, written out in long hand in
every installed header file, like this:
| #ifdef __cplusplus
extern "C" {
#endif
...
#ifdef __cplusplus
}
#endif
|
But that is a lot of unnecessary typing if you have a few dozen headers
in your project. Also the additional braces tend to confuse text
editors, such as emacs, which do automatic source indentation based on
brace characters.
Far better, then, to declare them as macros in a common header file, and
use the macros in your headers:
| #ifdef __cplusplus
# define BEGIN_C_DECLS extern "C" {
# define END_C_DECLS }
#else /* !__cplusplus */
# define BEGIN_C_DECLS
# define END_C_DECLS
#endif /* __cplusplus */
|
I have seen several projects that name such macros with a leading
underscore -- `_BEGIN_C_DECLS'. Any symbol with a leading
underscore is reserved for use by the compiler implementation, so you
shouldn't name any symbols of your own in this way. By way of
example, I recently ported the
Small(8) language
compiler to Unix, and almost all of the work was writing a Perl script
to rename huge numbers of symbols in the compiler's reserved namespace
to something more sensible so that GCC could even parse the
sources. Small was originally developed on Windows, and the author had
used a lot of symbols with a leading underscore. Although his symbol
names didn't clash with his own compiler, in some cases they were the
same as symbols used by GCC.
9.1.4 Function Definitions
As a stylistic convention, the return types for all function definitions
should be on a separate line. The main reason for this is that it makes
it very easy to find the functions in source file, by looking for
a single identifier at the start of a line followed by an open
parenthesis:
| $ egrep '^[_a-zA-Z][_a-zA-Z0-9]*[ \t]*\(' error.c
set_program_name (const char *path)
error (int exit_status, const char *mode, const char *message)
sic_warning (const char *message)
sic_error (const char *message)
sic_fatal (const char *message)
|
There are emacs lisp functions and various code analysis tools, such as
ansi2knr (see section 9.1.6 K&R Compilers), which rely on this
formatting convention, too. Even if you don't use those tools yourself,
your fellow developers might like to, so it is a good convention to
adopt.
9.1.5 Fallback Function Implementations
Due to the huge number of Unix varieties in common use today, many of
the C library functions that you take for granted on your prefered
development platform are very likely missing from some of the
architectures you would like your code to compile on. Fundamentally
there are two ways to cope with this:
-
Use only the few library calls that are available everywhere. In
reality this is not actually possible because there are two lowest
common denominators with mutually exclusive APIs, one rooted in
BSD Unix (`bcopy', `rindex') and the other in
SYSV Unix (`memcpy', `strrchr'). The only way to deal
with this is to define one API in terms of the other using the
preprocessor. The newer POSIX standard deprecates many of the
BSD originated calls (with exceptions such as the
BSD socket API). Even on non-POSIX platforms, there
has been so much cross pollination that often both varieties of a given
call may be provided, however you would be wise to write your code
using POSIX endorsed calls, and where they are missing, define them
in terms of whatever the host platform provides.
This approach requires a lot of knowledge about various system libraries
and standards documents, and can leave you with reams of preprocessor
code to handle the differences between APIS. You will also need
to perform a lot of checking in `configure.in' to figure out which
calls are available. For example, to allow the rest of your code to use
the `strcpy' call with impunity, you would need the following code
in `configure.in':
|
AC_CHECK_FUNCS(strcpy bcopy)
|
And the following preprocessor code in a header file that is seen by
every source file:
|
#if !HAVE_STRCPY
# if HAVE_BCOPY
# define strcpy(dest, src) bcopy (src, dest, 1 + strlen (src))
# else /* !HAVE_BCOPY */
error no strcpy or bcopy
# endif /* HAVE_BCOPY */
#endif /* HAVE_STRCPY */
|
-
Alternatively you could provide your own fallback implementations of
function calls you know are missing on some platforms. In practice you
don't need to be as knowledgable about problematic functions when using
this approach. You can look in GNU libiberty(9) or Franзois
Pinard's libit project(10) to see for which
functions other GNU developers have needed to implement fallback
code. The libit project is especially useful in this respect as it
comprises canonical versions of fallback functions, and suitable
Autoconf macros assembled from across the entire GNU project. I
won't give an example of setting up your package to use this approach,
since that is how I have chosen to structure the project described in
this chapter.
Rather than writing code to the lowest common denominator of system
libraries, I am a strong advocate of the latter school of thought in the
majority of cases. As with all things it pays to take a pragmatic
approach; don't be afraid of the middle ground -- weigh the options on
a case by case basis.
9.1.6 K&R Compilers
K&R C is the name now used to describe the original C language specified
by Brian Kernighan and Dennis Ritchie (hence, `K&R'). I have
yet to see a C compiler that doesn't support code written in the K&R
style, yet it has fallen very much into disuse in favor of the newer
ANSI C standard. Although it is increasingly common for vendors to
unbundle their ANSI C compiler, the GCC
project (11) is available for all of the architectures I have ever
used.
There are four differences between the two C standards:
-
ANSI C expects full type specification in function prototypes, such
as you might supply in a library header file:
|
extern int functionname (const char *parameter1, size_t parameter 2);
|
The nearest equivalent in K&R style C is a forward declaration, which
allows you to use a function before its corresponding definition:
|
extern int functionname ();
|
As you can imagine, K&R has very bad type safety, and does not perform
any checks that only function arguments of the correct type are used.
-
The function headers of each function definition are written
differently. Where you might see the following written in ANSI C:
|
int
functionname (const char *parameter1, size_t parameter2)
{
...
}
|
K&R expects the parameter type declarations separately, like this:
|
int
functionname (parameter1, parameter2)
const char *parameter1;
size_t parameter2;
{
...
}
|
-
There is no concept of an untyped pointer in K&R C. Where you might be
used to seeing `void *' pointers in ANSI code, you are forced
to overload the meaning of `char *' for K&R compilers.
-
Variadic functions are handled with a different API in K&R C,
imported with `#include <varargs.h>'. A K&R variadic function
definition looks like this:
|
int
functionname (va_alist)
va_dcl
{
va_list ap;
char *arg;
va_start (ap);
...
arg = va_arg (ap, char *);
...
va_end (ap);
return arg ? strlen (arg) : 0;
}
|
ANSI C provides a similar API, imported with `#include
<stdarg.h>', though it cannot express a variadic function with no named
arguments such as the one above. In practice, this isn't a problem
since you always need at least one parameter, either to specify the
total number of arguments somehow, or else to mark the end of the
argument list. An ANSI variadic function definition looks like
this:
|
int
functionname (char *format, ...)
{
va_list ap;
char *arg;
va_start (ap, format);
...
arg = va_arg (ap, char *);
...
va_end (ap);
return format ? strlen (format) : 0;
}
|
Except in very rare cases where you are writing a low level project
(GCC for example), you probably don't need to worry about K&R
compilers too much. However, supporting them can be very easy, and if
you are so inclined, can be handled either by employing the
ansi2knr program supplied with Automake, or by careful use of
the preprocessor.
Using ansi2knr in your project is described in some detail in
section `Automatic de-ANSI-fication' in The Automake Manual, but
boils down to the following:
-
Add this macro to your `configure.in' file:
-
Rewrite the contents of `LIBOBJS' and/or `LTLIBOBJS' in
the following fashion:
|
# This is necessary so that .o files in LIBOBJS are also built via
# the ANSI2KNR-filtering rules.
Xsed='sed -e "s/^X//"'
LIBOBJS=`echo X"$LIBOBJS"|\
[$Xsed -e 's/\.[^.]* /.\$U& /g;s/\.[^.]*$/.\$U&/']`
|
Personally, I dislike this method, since every source file is filtered
and rewritten with ANSI function prototypes and declarations
converted to K&R style adding a fair overhead in additional files in
your build tree, and in compilation time. This would be reasonable were
the abstraction sufficient to allow you to forget about K&R entirely,
but ansi2knr is a simple program, and does not address any of
the other differences between compilers that I raised above, and it
cannot handle macros in your function prototypes of definitions. If you
decide to use ansi2knr in your project, you must make the
decision before you write any code, and be aware of its limitations as
you develop.
For my own projects, I prefer to use a set of preprocessor macros along
with a few stylistic conventions so that all of the differences between
K&R and ANSI compilers are actually addressed, and so that the
unfortunate few who have no access to an ANSI compiler (and who
cannot use GCC for some reason) needn't suffer the overheads of
ansi2knr .
The four differences in style listed at the beginning of this subsection
are addressed as follows:
-
The function protoype argument lists are declared inside a
PARAMS
macro invocation so that K&R compilers will still be able to compile the
source tree. PARAMS removes ANSI argument lists from
function prototypes for K&R compilers. Some developers
continue to use __P for this purpose, but strictly speaking,
macros starting with `_' (and especially `__') are reserved
for the compiler and the system headers, so using `PARAMS', as
follows, is safer:
|
#if __STDC__
# ifndef NOPROTOS
# define PARAMS(args) args
# endif
#endif
#ifndef PARAMS
# define PARAMS(args) ()
#endif
|
This macro is then used for all function declarations like this:
|
extern int functionname PARAMS((const char *parameter));
|
-
With the
PARAMS macro is used for all function declarations,
ANSI compilers are given all the type information they require to
do full compile time type checking. The function definitions
proper must then be declared in K&R style so that K&R compilers don't
choke on ANSI syntax. There is a small amount of overhead in
writing code this way, however: The ANSI compile time type
checking can only work in conjunction with K&R function definitions if
it first sees an ANSI function prototype. This forces you to
develop the good habit of prototyping every single function in
your project. Even the static ones.
-
The easiest way to work around the lack of
void * pointers, is to
define a new type that is conditionally set to void * for
ANSI compilers, or char * for K&R compilers. You
should add the following to a common header file:
|
#if __STDC__
typedef void *void_ptr;
#else /* !__STDC__ */
typedef char *void_ptr;
#endif /* __STDC__ */
|
-
The difference between the two variadic function APIs pose a
stickier problem, and the solution is ugly. But it does work.
FIrst you must check for the headers in `configure.in':
|
AC_CHECK_HEADERS(stdarg.h varargs.h, break)
|
Having done this, add the following code to a common header file:
|
#if HAVE_STDARG_H
# include <stdarg.h>
# define VA_START(a, f) va_start(a, f)
#else
# if HAVE_VARARGS_H
# include <varargs.h>
# define VA_START(a, f) va_start(a)
# endif
#endif
#ifndef VA_START
error no variadic api
#endif
|
You must now supply each variadic function with both a K&R and an
ANSI definition, like this:
|
int
#if HAVE_STDARG_H
functionname (const char *format, ...)
#else
functionname (format, va_alist)
const char *format;
va_dcl
#endif
{
va_alist ap;
char *arg;
VA_START (ap, format);
...
arg = va_arg (ap, char *);
...
va_end (ap);
return arg : strlen (arg) ? 0;
}
|
9.2 A Simple Shell Builders Library
An application which most developers try their hand at sooner or later
is a Unix shell. There is a lot of functionality common to all
traditional command line shells, which I thought I would push into a
portable library to get you over the first hurdle when that moment is
upon you. Before elabourating on any of this I need to name the
project. I've called it sic, from the Latin so it is,
because like all good project names it is somewhat pretentious and it
lends itself to the recursive acronym sic is cumulative.
The gory detail of the minutae of the source is beyond the scope of
this book, but to convey a feel for the need for Sic, some of the
goals which influenced the design follow:
-
Sic must be very small so that, in addition to being used as the basis
for a full blown shell, it can be linked (unadorned) into an application
and used for trivial tasks, such as reading startup configuration.
-
It must not be tied to a particular syntax or set of reserved words. If
you use it to read your startup configuration, I don't want to force you
to use my syntax and commands.
-
The boundary between the library (`libsic') and the application
must be well defined. Sic will take strings of characters as input, and
internally parse and evaluate them according to registered commands and
syntax, returning results or diagnostics as appropriate.
-
It must be extremely portable -- that is what I am trying to illustrate
here, after all.
9.2.1 Portability Infrastructure
As I explained in 9.1.1 Project Directory Structure, I'll first create
the project directories, a toplevel dirctory and a subdirectory to put
the library sources into. I want to install the library header files
to `/usr/local/include/sic', so the library subdirectory must be
named appropriately. See section 9.1.2 C Header Files.
|
$ mkdir sic
$ mkdir sic/sic
$ cd sic/sic
|
I will describe the files I add in this section in more detail than the
project specific sources, because they comprise an infrastructure that I
use relatively unchanged for all of my GNU Autotools projects. You could
keep an archive of these files, and use them as a starting point
each time you begin a new project of your own.
9.2.1.1 Error Management
A good place to start with any project design is the error management
facility. In Sic I will use a simple group of functions to display
simple error messages. Here is `sic/error.h':
|
#ifndef SIC_ERROR_H
#define SIC_ERROR_H 1
#include <sic/common.h>
BEGIN_C_DECLS
extern const char *program_name;
extern void set_program_name (const char *argv0);
extern void sic_warning (const char *message);
extern void sic_error (const char *message);
extern void sic_fatal (const char *message);
END_C_DECLS
#endif /* !SIC_ERROR_H */
|
This header file follows the principles set out in 9.1.2 C Header Files.
I am storing the program_name variable in the library that uses
it, so that I can be sure that the library will build on architectures
that don't allow undefined symbols in libraries (12).
Keeping those preprocessor macro definitions designed to aid code
portability together (in a single file), is a good way to maintain the
readability of the rest of the code. For this project I will put that
code in `common.h':
|
#ifndef SIC_COMMON_H
#define SIC_COMMON_H 1
#if HAVE_CONFIG_H
# include <sic/config.h>
#endif
#include <stdio.h>
#include <sys/types.h>
#if STDC_HEADERS
# include <stdlib.h>
# include <string.h>
#elif HAVE_STRINGS_H
# include <strings.h>
#endif /*STDC_HEADERS*/
#if HAVE_UNISTD_H
# include <unistd.h>
#endif
#if HAVE_ERRNO_H
# include <errno.h>
#endif /*HAVE_ERRNO_H*/
#ifndef errno
/* Some systems #define this! */
extern int errno;
#endif
#endif /* !SIC_COMMON_H */
|
You may recognise some snippets of code from the Autoconf manual here---
in particular the inclusion of the project `config.h', which will
be generated shortly. Notice that I have been careful to conditionally
include any headers which are not guaranteed to exist on every
architecture. The rule of thumb here is that only `stdio.h' is
ubiquitous (though I have never heard of a machine that has no
`sys/types.h'). You can find more details of some of these in
section `Existing Tests' in The GNU Autoconf Manual.
Here is a little more code from `common.h':
|
#ifndef EXIT_SUCCESS
# define EXIT_SUCCESS 0
# define EXIT_FAILURE 1
#endif
|
The implementation of the error handling functions goes in
`error.c' and is very straightforward:
|
#if HAVE_CONFIG_H
# include <sic/config.h>
#endif
#include "common.h"
#include "error.h"
static void error (int exit_status, const char *mode,
const char *message);
static void
error (int exit_status, const char *mode, const char *message)
{
fprintf (stderr, "%s: %s: %s.\n", program_name, mode, message);
if (exit_status >= 0)
exit (exit_status);
}
void
sic_warning (const char *message)
{
error (-1, "warning", message);
}
void
sic_error (const char *message)
{
error (-1, "ERROR", message);
}
void
sic_fatal (const char *message)
{
error (EXIT_FAILURE, "FATAL", message);
}
|
I also need a definition of program_name ;
set_program_name copies the filename component of path into
the exported data, program_name . The xstrdup function
just calls strdup , but abort s if there is not enough
memory to make the copy:
|
const char *program_name = NULL;
void
set_program_name (const char *path)
{
if (!program_name)
program_name = xstrdup (basename (path));
}
|
9.2.1.2 Memory Management
A useful idiom common to many GNU projects is to wrap the memory
management functions to localise out of memory handling, naming
them with an `x' prefix. By doing this, the rest of the project is
relieved of having to remember to check for `NULL' returns from the
various memory functions. These wrappers use the error API
to report memory exhaustion and abort the program. I have placed the
implementation code in `xmalloc.c':
|
#if HAVE_CONFIG_H
# include <sic/config.h>
#endif
#include "common.h"
#include "error.h"
void *
xmalloc (size_t num)
{
void *new = malloc (num);
if (!new)
sic_fatal ("Memory exhausted");
return new;
}
void *
xrealloc (void *p, size_t num)
{
void *new;
if (!p)
return xmalloc (num);
new = realloc (p, num);
if (!new)
sic_fatal ("Memory exhausted");
return new;
}
void *
xcalloc (size_t num, size_t size)
{
void *new = xmalloc (num * size);
bzero (new, num * size);
return new;
}
|
Notice in the code above, that xcalloc is implemented in terms of
xmalloc , since calloc itself is not available in some
older C libraries.
Rather than create a separate `xmalloc.h' file, which would need to
be #include d from almost everywhere else, the logical place to
declare these functions is in `common.h', since the wrappers will
be called from most everywhere else in the code:
|
#ifdef __cplusplus
# define BEGIN_C_DECLS extern "C" {
# define END_C_DECLS }
#else
# define BEGIN_C_DECLS
# define END_C_DECLS
#endif
#define XCALLOC(type, num) \
((type *) xcalloc ((num), sizeof(type)))
#define XMALLOC(type, num) \
((type *) xmalloc ((num) * sizeof(type)))
#define XREALLOC(type, p, num) \
((type *) xrealloc ((p), (num) * sizeof(type)))
#define XFREE(stale) do { \
if (stale) { free (stale); stale = 0; } \
} while (0)
BEGIN_C_DECLS
extern void *xcalloc (size_t num, size_t size);
extern void *xmalloc (size_t num);
extern void *xrealloc (void *p, size_t num);
extern char *xstrdup (const char *string);
extern char *xstrerror (int errnum);
END_C_DECLS
|
By using the macros defined here, allocating and freeing heap memory is
reduced from:
|
char **argv = (char **) xmalloc (sizeof (char *) * 3);
do_stuff (argv);
if (argv)
free (argv);
|
to the simpler and more readable:
|
char **argv = XMALLOC (char *, 3);
do_stuff (argv);
XFREE (argv);
|
In the same spirit, I have borrowed `xstrdup.c' and
`xstrerror.c' from project GNU's libiberty. See section 9.1.5 Fallback Function Implementations.
9.2.1.3 Generalised List Data Type
In many C programs you will see various implementations and
re-implementations of lists and stacks, each tied to its own particular
project. It is surprisingly simple to write a catch-all implementation,
as I have done here with a generalised list operation API in
`list.h':
|
#ifndef SIC_LIST_H
#define SIC_LIST_H 1
#include <sic/common.h>
BEGIN_C_DECLS
typedef struct list {
struct list *next; /* chain forward pointer*/
void *userdata; /* incase you want to use raw Lists */
} List;
extern List *list_new (void *userdata);
extern List *list_cons (List *head, List *tail);
extern List *list_tail (List *head);
extern size_t list_length (List *head);
END_C_DECLS
#endif /* !SIC_LIST_H */
|
The trick is to ensure that any structures you want to chain together
have their forward pointer in the first field. Having done that, the
generic functions declared above can be used to manipulate any such
chain by casting it to List * and back again as necessary.
For example:
| struct foo {
struct foo *next;
char *bar;
struct baz *qux;
...
};
...
struct foo *foo_list = NULL;
foo_list = (struct foo *) list_cons ((List *) new_foo (),
(List *) foo_list);
...
|
The implementation of the list manipulation functions is in
`list.c':
|
#include "list.h"
List *
list_new (void *userdata)
{
List *new = XMALLOC (List, 1);
new->next = NULL;
new->userdata = userdata;
return new;
}
List *
list_cons (List *head, List *tail)
{
head->next = tail;
return head;
}
List *
list_tail (List *head)
{
return head->next;
}
size_t
list_length (List *head)
{
size_t n;
for (n = 0; head; ++n)
head = head->next;
return n;
}
|
9.2.2.1 `sic.c' & `sic.h'
Here are the functions for creating and managing sic parsers.
|
#ifndef SIC_SIC_H
#define SIC_SIC_H 1
#include <sic/common.h>
#include <sic/error.h>
#include <sic/list.h>
#include <sic/syntax.h>
typedef struct sic {
char *result; /* result string */
size_t len; /* bytes used by result field */
size_t lim; /* bytes allocated to result field */
struct builtintab *builtins; /* tables of builtin functions */
SyntaxTable **syntax; /* dispatch table for syntax of input */
List *syntax_init; /* stack of syntax state initialisers */
List *syntax_finish; /* stack of syntax state finalizers */
SicState *state; /* state data from syntax extensions */
} Sic;
#endif /* !SIC_SIC_H */
|
9.2.2.2 `builtin.c' & `builtin.h'
Here are the functions for managing tables of builtin commands in each
Sic structure:
|
typedef int (*builtin_handler) (Sic *sic,
int argc, char *const argv[]);
typedef struct {
const char *name;
builtin_handler func;
int min, max;
} Builtin;
typedef struct builtintab BuiltinTab;
extern Builtin *builtin_find (Sic *sic, const char *name);
extern int builtin_install (Sic *sic, Builtin *table);
extern int builtin_remove (Sic *sic, Builtin *table);
|
9.2.2.3 `eval.c' & `eval.h'
Having created a Sic parser, and populated it with some
Builtin handlers, a user of this library must tokenize and
evaluate its input stream. These files define a structure for storing
tokenized strings (Tokens ), and functions for converting
char * strings both to and from this structure type:
|
#ifndef SIC_EVAL_H
#define SIC_EVAL_H 1
#include <sic/common.h>
#include <sic/sic.h>
BEGIN_C_DECLS
typedef struct {
int argc; /* number of elements in ARGV */
char **argv; /* array of pointers to elements */
size_t lim; /* number of bytes allocated */
} Tokens;
extern int eval (Sic *sic, Tokens *tokens);
extern int untokenize (Sic *sic, char **pcommand, Tokens *tokens);
extern int tokenize (Sic *sic, Tokens **ptokens, char **pcommand);
END_C_DECLS
#endif /* !SIC_EVAL_H */
|
These files also define the eval function, which examines a
Tokens structure in the context of the given Sic parser,
dispatching the argv array to a relevant Builtin handler,
also written by the library user.
9.2.2.4 `syntax.c' & `syntax.h'
When tokenize splits a char * string into parts, by
default it breaks the string into words delimited by whitespace. These
files define the interface for changing this default behaviour, by
registering callback functions which the parser will run when it meets
an `interesting' symbol in the input stream. Here are the
declarations from `syntax.h':
|
BEGIN_C_DECLS
typedef int SyntaxHandler (struct sic *sic, BufferIn *in,
BufferOut *out);
typedef struct syntax {
SyntaxHandler *handler;
char *ch;
} Syntax;
extern int syntax_install (struct sic *sic, Syntax *table);
extern SyntaxHandler *syntax_handler (struct sic *sic, int ch);
END_C_DECLS
|
A SyntaxHandler is a function called by tokenize as it
consumes its input to create a Tokens structure; the two
functions associate a table of such handlers with a given Sic
parser, and find the particular handler for a given character in that
Sic parser, respectively.
9.2.3 Beginnings of a `configure.in'
Now that I have some code, I can run autoscan to generate a
preliminary `configure.in'. autoscan will examine all of
the sources in the current directory tree looking for common points of
non-portability, adding macros suitable for detecting the discovered
problems. autoscan generates the following in
`configure.scan':
|
# Process this file with autoconf to produce a configure script.
AC_INIT(sic/eval.h)
# Checks for programs.
# Checks for libraries.
# Checks for header files.
AC_HEADER_STDC
AC_CHECK_HEADERS(strings.h unistd.h)
# Checks for typedefs, structures, and compiler characteristics.
AC_C_CONST
AC_TYPE_SIZE_T
# Checks for library functions.
AC_FUNC_VPRINTF
AC_CHECK_FUNCS(strerror)
AC_OUTPUT()
|
Since the generated `configure.scan' does not overwrite your
project's `configure.in', it is a good idea to run
autoscan periodically even in established project source
trees, and compare the two files. Sometimes autoscan will
find some portability issue you have overlooked, or weren't aware of.
Looking through the documentation for the macros in this
`configure.scan', AC_C_CONST and AC_TYPE_SIZE_T will
take care of themselves (provided I ensure that `config.h' is
included into every source file), and AC_HEADER_STDC and
AC_CHECK_HEADERS(unistd.h) are already taken care of in
`common.h'.
autoscan is no silver bullet! Even here in this
simple example, I need to manually add macros to check for the presence
of `errno.h':
|
AC_CHECK_HEADERS(errno.h strings.h unistd.h)
|
I also need to manually add the Autoconf macro for generating
`config.h'; a macro to initialise automake support; and a
macro to check for the presence of ranlib . These should go
close to the start of `configure.in':
|
...
AC_CONFIG_HEADER(config.h)
AM_INIT_AUTOMAKE(sic, 0.5)
AC_PROG_CC
AC_PROG_RANLIB
...
|
An interesting macro suggested by autoscan is
AC_CHECK_FUNCS(strerror) . This tells me that I need to provide a
replacement implementation of strerror for the benefit of
architectures which don't have it in their system libraries. This is
resolved by providing a file with a fallback implementation for the
named function, and creating a library from it and any others that
`configure' discovers to be lacking from the system library on the
target host.
You will recall that `configure' is the shell script the end user
of this package will run on their machine to test that it has all the
features the package wants to use. The library that is created will
allow the rest of the project to be written in the knowledge that any
functions required by the project but missing from the installers system
libraries will be available nonetheless. GNU `libiberty'
comes to the rescue again -- it already has an implementation of
`strerror.c' that I was able to use with a little modification.
Being able to supply a simple implementation of strerror , as the
`strerror.c' file from `libiberty' does, relies on there being
a well defined sys_errlist variable. It is a fair bet that if
the target host has no strerror implementation, however, that the
system sys_errlist will be broken or missing. I need to write a
configure macro to check whether the system defines sys_errlist ,
and tailor the code in `strerror.c' to use this knowledge.
To avoid clutter in the top-level directory, I am a great believer in
keeping as many of the configuration files as possible in their own
sub-directory. First of all, I will create a new directory called
`config' inside the top-level directory, and put
`sys_errlist.m4' inside it:
|
AC_DEFUN(SIC_VAR_SYS_ERRLIST,
[AC_CACHE_CHECK([for sys_errlist],
sic_cv_var_sys_errlist,
[AC_TRY_LINK([int *p;], [extern int sys_errlist; p = &sys_errlist;],
sic_cv_var_sys_errlist=yes, sic_cv_var_sys_errlist=no)])
if test x"$sic_cv_var_sys_errlist" = xyes; then
AC_DEFINE(HAVE_SYS_ERRLIST, 1,
[Define if your system libraries have a sys_errlist variable.])
fi])
|
I must then add a call to this new macro in the `configure.in' file
being careful to put it in the right place --
somwhere between typedefs and structures and library
functions according to the comments in `configure.scan':
GNU Autotools can also be set to store most of their files in a
subdirectory, by calling the AC_CONFIG_AUX_DIR macro near the top
of `configure.in', preferably right after AC_INIT :
|
AC_INIT(sic/eval.c)
AC_CONFIG_AUX_DIR(config)
AM_CONFIG_HEADER(config.h)
...
|
Having made this change, many of the files added by running
autoconf and automake --add-missing will be put in
the aux_dir.
The source tree now looks like this:
|
sic/
+-- configure.scan
+-- config/
| +-- sys_errlist.m4
+-- replace/
| +-- strerror.c
+-- sic/
+-- builtin.c
+-- builtin.h
+-- common.h
+-- error.c
+-- error.h
+-- eval.c
+-- eval.h
+-- list.c
+-- list.h
+-- sic.c
+-- sic.h
+-- syntax.c
+-- syntax.h
+-- xmalloc.c
+-- xstrdup.c
+-- xstrerror.c
|
In order to correctly utilise the fallback implementation,
AC_CHECK_FUNCS(strerror) needs to be removed and strerror
added to AC_REPLACE_FUNCS :
|
# Checks for library functions.
AC_REPLACE_FUNCS(strerror)
|
This will be clearer if you look at the `Makefile.am' for the
`replace' subdirectory:
|
## Makefile.am -- Process this file with automake to produce Makefile.in
INCLUDES = -I$(top_builddir) -I$(top_srcdir)
noinst_LIBRARIES = libreplace.a
libreplace_a_SOURCES =
libreplace_a_LIBADD = @LIBOBJS@
|
The code tells automake that I want to build a library for use
within the build tree (i.e. not installed -- `noinst'), and that
has no source files by default. The clever part here is that when
someone comes to install Sic, they will run configure which
will test for strerror , and add `strerror.o' to
LIBOBJS if the target host environment is missing its own
implementation. Now, when `configure' creates
`replace/Makefile' (as I asked it to with AC_OUTPUT ),
`@LIBOBJS@' is replaced by the list of objects required on the
installer's machine.
Having done all this at configure time, when my user runs
make , the files required to replace functions missing
from their target machine will be added to `libreplace.a'.
Unfortunately this is not quite enough to start building the project.
First I need to add a top-level `Makefile.am' from which to
ultimately create a top-level `Makefile' that will descend into
the various subdirectories of the project:
|
## Makefile.am -- Process this file with automake to produce Makefile.in
SUBDIRS = replace sic
|
And `configure.in' must be told where it can find instances of
Makefile.in :
|
AC_OUTPUT(Makefile replace/Makefile sic/Makefile)
|
I have written a bootstrap script for Sic, for details see
8. Bootstrapping:
|
#! /bin/sh
set -x
aclocal -I config
autoheader
automake --foreign --add-missing --copy
autoconf
|
The `--foreign' option to automake tells it to relax
the GNU standards for various files that should be present in a
GNU distribution. Using this option saves me from havng to create
empty files as we did in 5. A Minimal GNU Autotools Project.
Right. Let's build the library! First, I'll run bootstrap :
|
$ ./bootstrap
+ aclocal -I config
+ autoheader
+ automake --foreign --add-missing --copy
automake: configure.in: installing config/install-sh
automake: configure.in: installing config/mkinstalldirs
automake: configure.in: installing config/missing
+ autoconf
|
The project is now in the same state that an end-user would see, having
unpacked a distribution tarball. What follows is what an end user might
expect to see when building from that tarball:
|
$ ./configure
creating cache ./config.cache
checking for a BSD compatible install... /usr/bin/install -c
checking whether build environment is sane... yes
checking whether make sets ${MAKE}... yes
checking for working aclocal... found
checking for working autoconf... found
checking for working automake... found
checking for working autoheader... found
checking for working makeinfo... found
checking for gcc... gcc
checking whether the C compiler (gcc ) works... yes
checking whether the C compiler (gcc ) is a cross-compiler... no
checking whether we are using GNU C... yes
checking whether gcc accepts -g... yes
checking for ranlib... ranlib
checking how to run the C preprocessor... gcc -E
checking for ANSI C header files... yes
checking for unistd.h... yes
checking for errno.h... yes
checking for string.h... yes
checking for working const... yes
checking for size_t... yes
checking for strerror... yes
updating cache ./config.cache
creating ./config.status
creating Makefile
creating replace/Makefile
creating sic/Makefile
creating config.h
|
Compare this output with the contents of `configure.in', and notice
how each macro is ultimately responsible for one or more consecutive
tests (via the Bourne shell code generated in `configure'). Now
that the `Makefile's have been successfully created, it is safe to
call make to perform the actual compilation:
|
$ make
make all-recursive
make[1]: Entering directory `/tmp/sic'
Making all in replace
make[2]: Entering directory `/tmp/sic/replace'
rm -f libreplace.a
ar cru libreplace.a
ranlib libreplace.a
make[2]: Leaving directory `/tmp/sic/replace'
Making all in sic
make[2]: Entering directory `/tmp/sic/sic'
gcc -DHAVE_CONFIG_H -I. -I. -I.. -I.. -g -O2 -c builtin.c
gcc -DHAVE_CONFIG_H -I. -I. -I.. -I.. -g -O2 -c error.c
gcc -DHAVE_CONFIG_H -I. -I. -I.. -I.. -g -O2 -c eval.c
gcc -DHAVE_CONFIG_H -I. -I. -I.. -I.. -g -O2 -c list.c
gcc -DHAVE_CONFIG_H -I. -I. -I.. -I.. -g -O2 -c sic.c
gcc -DHAVE_CONFIG_H -I. -I. -I.. -I.. -g -O2 -c syntax.c
gcc -DHAVE_CONFIG_H -I. -I. -I.. -I.. -g -O2 -c xmalloc.c
gcc -DHAVE_CONFIG_H -I. -I. -I.. -I.. -g -O2 -c xstrdup.c
gcc -DHAVE_CONFIG_H -I. -I. -I.. -I.. -g -O2 -c xstrerror.c
rm -f libsic.a
ar cru libsic.a builtin.o error.o eval.o list.o sic.o syntax.o xmalloc.o
xstrdup.o xstrerror.o
ranlib libsic.a
make[2]: Leaving directory `/tmp/sic/sic'
make[1]: Leaving directory `/tmp/sic'
|
On this machine, as you can see from the output of configure
above, I have no need of the fallback implementation of strerror ,
so `libreplace.a' is empty. On another machine this might not be
the case. In any event, I now have a compiled `libsic.a' -- so
far, so good.
9.3 A Sample Shell Application
What I need now, is a program that uses `libsic.a', if only to give
me confidence that it is working. In this section, I will write a
simple shell which uses the library. But first, I'll create a directory
to put it in:
|
$ mkdir src
$ ls -F
COPYING Makefile.am aclocal.m4 configure* config/ sic/
INSTALL Makefile.in bootstrap* configure.in replace/ src/
$ cd src
|
In order to put this shell together, we need to provide just a few
things for integration with `libsic.a'...
9.3.1 `sic_repl.c'
In `sic_repl.c'(13) there is a loop for
reading strings typed by the user, evaluating them and printing the
results. GNU readline is ideally suited to this, but it is not
always available -- or sometimes people simply may not wish to use it.
With the help of GNU Autotools, it is very easy to cater for building with
and without GNU readline. `sic_repl.c' uses this function to
read lines of input from the user:
|
static char *
getline (FILE *in, const char *prompt)
{
static char *buf = NULL; /* Always allocated and freed
from inside this function. */
XFREE (buf);
buf = (char *) readline ((char *) prompt);
#ifdef HAVE_ADD_HISTORY
if (buf && *buf)
add_history (buf);
#endif
return buf;
}
|
To make this work, I must write an Autoconf macro which adds an option
to `configure', so that when the package is installed, it will use
the readline library if `--with-readline' is used:
|
AC_DEFUN(SIC_WITH_READLINE,
[AC_ARG_WITH(readline,
[ --with-readline compile with the system readline library],
[if test x"${withval-no}" != xno; then
sic_save_LIBS=$LIBS
AC_CHECK_LIB(readline, readline)
if test x"${ac_cv_lib_readline_readline}" = xno; then
AC_MSG_ERROR(libreadline not found)
fi
LIBS=$sic_save_LIBS
fi])
AM_CONDITIONAL(WITH_READLINE, test x"${with_readline-no}" != xno)
])
|
Having put this macro in the file `config/readline.m4', I must also
call the new macro (SIC_WITH_READLINE ) from `configure.in'.
9.3.2 `sic_syntax.c'
The syntax of the commands in the shell I am writing is defined by a set
of syntax handlers which are loaded into `libsic' at startup. I
can get the C preprocessor to do most of the repetitive code for me, and
just fill in the function bodies:
|
#if HAVE_CONFIG_H
# include <sic/config.h>
#endif
#include "sic.h"
/* List of builtin syntax. */
#define syntax_functions \
SYNTAX(escape, "\\") \
SYNTAX(space, " \f\n\r\t\v") \
SYNTAX(comment, "#") \
SYNTAX(string, "\"") \
SYNTAX(endcmd, ";") \
SYNTAX(endstr, "")
/* Prototype Generator. */
#define SIC_SYNTAX(name) \
int name (Sic *sic, BufferIn *in, BufferOut *out)
#define SYNTAX(name, string) \
extern SIC_SYNTAX (CONC (syntax_, name));
syntax_functions
#undef SYNTAX
/* Syntax handler mappings. */
Syntax syntax_table[] = {
#define SYNTAX(name, string) \
{ CONC (syntax_, name), string },
syntax_functions
#undef SYNTAX
{ NULL, NULL }
};
|
This code writes the prototypes for the syntax handler functions, and
creates a table which associates each with one or more characters that
might occur in the input stream. The advantage of writing the code this
way is that when I want to add a new syntax handler later, it is a simple
matter of adding a new row to the syntax_functions macro, and
writing the function itself.
9.3.3 `sic_builtin.c'
In addition to the syntax handlers I have just added to the Sic shell,
the language of this shell is also defined by the builtin commands it
provides. The infrastructure for this file is built from a table of
functions which is fed into various C preprocessor macros, just as I did
for the syntax handlers.
One builtin handler function has special status, builtin_unknown .
This is the builtin that is called, if the Sic library cannot find a
suitable builtin function to handle the current input command. At first
this doesn't sound especially important -- but it is the key to any
shell implementation. When there is no builtin handler for the command,
the shell will search the users command path, `$PATH', to find a
suitable executable. And this is the job of builtin_unknown :
|
int
builtin_unknown (Sic *sic, int argc, char *const argv[])
{
char *path = path_find (argv[0]);
int status = SIC_ERROR;
if (!path)
{
sic_result_append (sic, "command \"");
sic_result_append (sic, argv[0]);
sic_result_append (sic, "\" not found");
}
else if (path_execute (sic, path, argv) != SIC_OKAY)
{
sic_result_append (sic, "command \"");
sic_result_append (sic, argv[0]);
sic_result_append (sic, "\" failed: ");
sic_result_append (sic, strerror (errno));
}
else
status = SIC_OKAY;
return status;
}
static char *
path_find (const char *command)
{
char *path = xstrdup (command);
if (*command == '/')
{
if (access (command, X_OK) < 0)
goto notfound;
}
else
{
char *PATH = getenv ("PATH");
char *pbeg, *pend;
size_t len;
for (pbeg = PATH; *pbeg != '\0'; pbeg = pend)
{
pbeg += strspn (pbeg, ":");
len = strcspn (pbeg, ":");
pend = pbeg + len;
path = XREALLOC (char, path, 2 + len + strlen(command));
*path = '\0';
strncat (path, pbeg, len);
if (path[len -1] != '/') strcat (path, "/");
strcat (path, command);
if (access (path, X_OK) == 0)
break;
}
if (*pbeg == '\0')
goto notfound;
}
return path;
notfound:
XFREE (path);
return NULL;
}
|
Running `autoscan' again at this point adds
AC_CHECK_FUNCS(strcspn strspn) to `configure.scan'. This
tells me that these functions are not truly portable. As before I
provide fallback implementations for these functions incase they are
missing from the target host -- and as it turns out, they are easy to
write:
|
/* strcspn.c -- implement strcspn() for architectures without it */
#if HAVE_CONFIG_H
# include <sic/config.h>
#endif
#include <sys/types.h>
#if STDC_HEADERS
# include <string.h>
#elif HAVE_STRINGS_H
# include <strings.h>
#endif
#if !HAVE_STRCHR
# ifndef strchr
# define strchr index
# endif
#endif
size_t
strcspn (const char *string, const char *reject)
{
size_t count = 0;
while (strchr (reject, *string) == 0)
++count, ++string;
return count;
}
|
There is no need to add any code to `Makefile.am', because the
configure script will automatically add the names of the
missing function sources to `@LIBOBJS@'.
This implementation uses the autoconf generated
`config.h' to get information about the availability of headers and
type definitions. It is interesting that autoscan reports
that strchr and strrchr , which are used in the fallback
implementations of strcspn and strspn respectively, are
themselves not portable! Luckily, the Autoconf manual tells me exactly
how to deal with this: by adding some code to my `common.h'
(paraphrased from the literal code in the manual):
|
#if !STDC_HEADERS
# if !HAVE_STRCHR
# define strchr index
# define strrchr rindex
# endif
#endif
|
And another macro in `configure.in':
9.3.4 `sic.c' & `sic.h'
Since the application binary has no installed header files, there is
little point in maintaining a corresponding header file for every
source, all of the structures shared by these files, and non-static
functions in these files are declared in `sic.h':
|
#ifndef SIC_H
#define SIC_H 1
#include <sic/common.h>
#include <sic/sic.h>
#include <sic/builtin.h>
BEGIN_C_DECLS
extern Syntax syntax_table[];
extern Builtin builtin_table[];
extern Syntax syntax_table[];
extern int evalstream (Sic *sic, FILE *stream);
extern int evalline (Sic *sic, char **pline);
extern int source (Sic *sic, const char *path);
extern int syntax_init (Sic *sic);
extern int syntax_finish (Sic *sic, BufferIn *in, BufferOut *out);
END_C_DECLS
#endif /* !SIC_H */
|
To hold together everything you have seen so far, the main
function creates a Sic parser and initialises it by adding syntax
handler functions and builtin functions from the two tables defined
earlier, before handing control to evalstream which will
eventually exit when the input stream is exhausted.
|
int
main (int argc, char * const argv[])
{
int result = EXIT_SUCCESS;
Sic *sic = sic_new ();
/* initialise the system */
if (sic_init (sic) != SIC_OKAY)
sic_fatal ("sic initialisation failed");
signal (SIGINT, SIG_IGN);
setbuf (stdout, NULL);
/* initial symbols */
sicstate_set (sic, "PS1", "] ", NULL);
sicstate_set (sic, "PS2", "- ", NULL);
/* evaluate the input stream */
evalstream (sic, stdin);
exit (result);
}
|
Now, the shell can be built and used:
|
$ bootstrap
...
$ ./configure --with-readline
...
$ make
...
make[2]: Entering directory `/tmp/sic/src'
gcc -DHAVE_CONFIG_H -I. -I.. -I../sic -I.. -I../sic -g -c sic.c
gcc -DHAVE_CONFIG_H -I. -I.. -I../sic -I.. -I../sic -g -c sic_builtin.c
gcc -DHAVE_CONFIG_H -I. -I.. -I../sic -I.. -I../sic -g -c sic_repl.c
gcc -DHAVE_CONFIG_H -I. -I.. -I../sic -I.. -I../sic -g -c sic_syntax.c
gcc -g -O2 -o sic sic.o sic_builtin.o sic_repl.o sic_syntax.o \
../sic/libsic.a ../replace/libreplace.a -lreadline
make[2]: Leaving directory `/tmp/sic/src'
...
$ ./src/sic
] pwd
/tmp/sic
] ls -F
Makefile aclocal.m4 config.cache configure* sic/
Makefile.am bootstrap* config.log configure.in src/
Makefile.in config/ config.status* replace/
] exit
$
|
This chapter has developed a solid foundation of code, which I will
return to in 12. A Large GNU Autotools Project, when Libtool will join
the fray. The chapters leading up to that explain what Libtool is for,
how to use it and integrate it into your own projects, and the
advantages it offers over building shared libraries with Automake (or
even just Make) alone.
|