In this chapter we'll learn about two mildly advanced uses of the preprocessor: "function-like" preprocessor macros (also called "macros with arguments") and "nested header files" (also known as "nested #include files").
20.1: Function-Like Preprocessor Macros
So far, we've been defining simple preprocessor macros with simple values, such as
#define MAXLINE 200
and
#define DATAFILE "data.dat"
These macros always expand to constant text (in these examples, the integer constant 200 and the string literal "data.dat", respectively) wherever they're used. However, it's also possible to define macros which expand to text which is different each time, depending on some subsidiary text which you specify. These macros take arguments, in much the same way that functions take arguments. In either case, the outcome (the expansion of the macro, like the action of the function) depends in some way on the particular values passed to it as arguments. The basic syntax of a function-like macro definition is
#define macroname( args ) expansion
There must be no space between macroname and the open parenthesis.
We will illustrate the use of function-like macros with several examples.
In a previous chapter, we used the "bitwise" operators &, |, and ~ to manipulate individual bits within an integer value or "flags word." In one application, we defined several simple macros whose values were "bitmasks":
#define DIRTY 0x01 #define OPEN 0x02 #define VERBOSE 0x04
Then we used code like
flags |= DIRTY;
to "set the DIRTY bit," and code like
flags &= ~DIRTY;
to clear the DIRTY bit, and code like
if(flags & DIRTY)
to test it. With enough practice, these idioms become familiar enough that they can be read immediately, but suppose we wanted to make them less cryptic. Using the preprocessor, we'll be able to set up macros so that we can write
SETBIT(flags, DIRTY);
and
CLEARBIT(flags, DIRTY);
and
if(TESTBIT(flags, DIRTY))
The definition of the SETBIT() macro might look like this:
#define SETBIT(x, b) x |= b
When a function-like macro is expanded, the preprocessor keeps track of the "arguments" it was "called" with. When we write
SETBIT(flags, DIRTY);
we're invoking the SETBIT() macro with a first argument of flags and a second argument of DIRTY. Within the definition of the macro, those arguments were known as x and b. So in the replacement text of the macro, x |= b, everywhere that x appears it will be further replaced by (in this case) flags, and everywhere that b appears it will be replaced by DIRTY. So the invocation
SETBIT(flags, DIRTY);
will result in the expansion
flags |= DIRTY;
Notice that the semicolon had nothing to do with macro expansion; it appeared following the close parenthesis of the invocation, and so shows up following the final expansion.
Similarly, we can define the CLEARBIT() and TESTBIT() macros like this:
#define CLEARBIT(x, b) x &= ~b #define TESTBIT(x, b) x & b
Convince yourself that the invocations
CLEARBIT(flags, DIRTY);
and
if(TESTBIT(flags, DIRTY))
will result in the expansions
flags &= ~DIRTY;
and
if(flags & DIRTY)
as desired.
Just as for a regular function, parameter names such as x and b in a function-like macro definition are arbitrary; they're just used to indicate where in the replacement text the actual argument "values" should be plugged in. Also, those parameter names are not looked for within character or string constants. If you had a macro like
#define XX(a, b) printf("%s is a %s\n", a, b)
then the invocation
XX("John", "pumpkin-head");
would result in
#define XX(a, b) printf("%s is a %s\n", "John", "pumpkin-head");
It would not result in
#define XX(a, b) printf("%s is "John" %s\n", "John", "pumpkin-head");
which (in this case, anyway) would not have been at all what you wanted.
If we remember that (other than being careful not to expand macro arguments inside of string and character constants) the preprocessor is otherwise pretty dumb and literal-minded, we can see why there must not be a space between the macro name and the open parenthesis in a function-like macro definition. If we wrote
#define SETBIT (x, b) x |= b
the preprocessor would think we were defining a simple macro, named SETBIT, with the (rather meaningless) replacement text (x, b) x |= b , and every time it saw SETBIT, it would replace it with (x, b) x |= b . (It would ignore any parentheses and arguments that the invocation of SETBIT happened to be followed with; that is, after the incorrect definition, the invocation
SETBIT(flags, DIRTY);
would expand to
(x, b) x |= b(flags, DIRTY);
where the (flags, DIRTY) part passed through without modification, along with the trailing semicolon.)
There are a few potential pitfalls associated with preprocessor macros, and with function-like ones in particular. To illustrate these, let's look at another example. C has no built-in exponentiation operator; if you want to square something, the easiest way is usually to multiply it by itself. Suppose that you got tired of writing
x * x
and
a * a + b * b
and
(x + 1) * (x + 1)
Knowing about function-like preprocessor macros, you might be inspired to define a SQUARE() macro:
#define SQUARE(z) z * z
Now you can write things like SQUARE(x) and SQUARE(a) + SQUARE(b), and this seems like it will be workable and convenient. But wait: what about that third example? If you write
y = SQUARE(x + 1);
the simpleminded preprocessor will expand it to
y = x + 1 * x + 1;
Remember, the preprocessor doesn't evaluate arguments the same way a function call would, it just performs textual substitutions. So in this last example, the "value" of the macro parameter z is x + 1, and everywhere that a z had appeared in the replacement text, the preprocessor fills in x + 1. But when the rest of the compiler sees the result, it will give multiplication higher precedence, as usual, and it will interpret the result as if you had written
y = x + (1 * x) + 1;
which will not usually give you the result you wanted!
How can we fix this problem? We could forbid ourselves to ever "call" the SQUARE() macro on an argument that wasn't a single constant or variable name, but this seems like a harsh restriction. A better solution is to play with the definition of the macro itself: since the expansion we want is
(x + 1) * (x + 1)
we can achieve that by defining the macro like this:
#define SQUARE(z) (z) * (z)
Now
y = SQUARE(x + 1);
expands to
y = (x + 1) * (x + 1);
as we wished.
There's another problem, though: what if we write
q = 1 / SQUARE(r);
Now we get
q = 1 / (r) * (r)
and the rest of the compiler interprets this as
q = (1 / (r)) * (r)
(Multiplication and division have the same precedence, and by default they go from left to right.) What can we do this time? We could enclose the invocation of the SQUARE() macro in extra parentheses, like this:
q = 1 / (SQUARE(r));
but that seems like a real nuisance to remember. A better solution is to build those extra parentheses into the definition of the macro, too:
#define SQUARE(z) ((z) * (z))
Now the code 1 / SQUARE(r) expands to 1 / ((r) * (r)) and we have a macro that's safe against all of the troublesome invocations we've tried so far.
There's a third potential problem, though: suppose we write
y = SQUARE(x++);
Even with all of our parentheses, this expands to
y = ((x++) * (x++));
and this is a distinct no-no, because we're incrementing x twice within the same expression. We might end up with y containing the value x * x, as we wanted, but it's somewhat more likely that we'll end up with (x + 1) * x or x * (x + 1), instead. (We're now worried not just about what the macro expands to, but what the resultant expression evaluates to.) Furthermore, since expressions like x++ * x++ are undefined according to the ANSI/ISO C Standard, they can actually result in anything, even complete nonsense. So SQUARE(x++) simply isn't going to work. (The explicit parentheses, by the way, don't make the expression any less undefined.)
There's no good fix for this third problem. We are going to have to remember that when we invoke function-like macros, the macro might expand one of its arguments multiple times, so we had better not ever give it an argument with a side effect, such as x++, or else the side effect might end up happening multiple times, with undefined results. (That's one reason we always use capital letters for macro names, to remind ourselves that they are special, and that we might have to be careful when invoking them.)
The other way around the third problem is not to use a function-like preprocessor macro at all, but instead to use a genuine function. If we defined
int square(int x)
{
return x * x;
}
then we wouldn't have any of these problems. (Of course, then we'd have the limitation that we could only use this square function on arguments of a certain type, in this case, int. We could declare it as accepting and returning type double, but then we might worry that it was doing needless floating-point conversions in the cases where we handed it integer values...)
When should you use a function-like macro and when should you use a real function? In most cases, it's safer to use real functions. Generally, you use function-like macros only when the code they expand to is quite small and simple, and when defining and using a real function would for some reason be awkward, or when the code will be executed so often that the overhead of calling a real function would significantly impact the program's efficiency.
As an example of how a real function might be awkward, notice that we couldn't write SETBIT() and CLEARBIT() as conventional functions, because functions can't modify their arguments, yet SETBIT() and CLEARBIT() are supposed to. (That is, SETBIT(flags, DIRTY) modifies flags.)
To summarize the important rules of this section, whenever defining a function-like macro, remember:
Rewriting our first three examples to follow these rules, we'd have:
#define SETBIT(x, b) ((x) |= (b)) #define CLEARBIT(x, b) ((x) &= ~(b)) #define TESTBIT(x, b) ((x) & (b))
(It's harder to see how SETBIT() and CLEARBIT() might fail if they weren't parenthesized, but unless you're really sure of yourself, there usually isn't a reason not to use the extra parentheses.)
A few final notes about function-like preprocessor macros: Sometimes, people try to write function-like macros which are even more like functions in that they expand to multiple statements; however, this is considerably trickier than it looks (at least, if it's not to fall victim to additional sets of pitfalls). Also, people sometimes wish for macros that take a variable number of arguments (in much the same way that the printf function accepts a variable number of arguments), but there's not yet a good way to do this, either.
Suppose you have written a little set of functions which you expect that other parts of your program (or other parts of other people's programs) will call. And, so that it will be easier for you (and them) to call the functions correctly, suppose that you have written a header file containing external prototype declarations for the functions. And, suppose that the prototypes look like this:
extern int f1(int); extern double f2(int, double); extern int f3(int, FILE *);
You might put these three declaration in a file called funcs.h.
For now, we don't need to worry about what these three functions might do, other than to notice that f3 obviously reads from or writes to a FILE * stdio stream.
Now, suppose that you have a source file containing a function which calls f1 and/or f2. At the top of that source file, you would put the line
#include "funcs.h"
However, if you were unlucky, the compiler would get down to the line
extern int f3(int, FILE *);
within funcs.h and complain, because it would not know what a FILE is and so would not know how to think about a function that accepts a pointer to one. If the calling program (that is, the source file that included "funcs.h") didn't call f3 or printf or fopen or any of the other stdio functions, it would have no reason to include <stdio.h>, and FILE would remain undefined. (If, on the other hand, the source file in question did happen to include <stdio.h>, and if it included it before it included "funcs.h", there would be no problem.)
What's the right thing to do here? We could say that anyone who included "funcs.h" always had to include <stdio.h>, first. But you can think of header files a little bit like you think of functions: it's nice if they're "black boxes", if you don't have to worry about what's inside them, if you don't have to worry about including them in a certain order.
Another way to think about the situation is this: since the prototype for f3 inside of funcs.h needs stdio.h, maybe we should put the line
#include <stdio.h>
right there at the top of funcs.h! Is that legal? Can the preprocessor handle seeing an #include directive when it's already in the middle of processing another #include directive? The answer is that yes, it can; header files (that is, #include directives) may be nested. (They may be nested up to a depth of at least 8, although many compilers probably allow more.) Once funcs.h takes care of its own needs, by including <stdio.h> itself, the eventual top-level file (that is, the one you compile, the one that includes "funcs.h") won't get error messages about FILE being undefined, and won't have to worry about whether it includes <stdio.h> or not.
Or will it? What if the top-level source file does include <stdio.h>? Now <stdio.h> will end up being processed twice, once when the top-level source file asks for it, and once when funcs.h asks for it. Will everything work correctly if <stdio.h> is included twice? Again, the answer is yes; the Standard requires that the standard header files protect themselves against multiple inclusion.
It's good that the standard header files are protected in this way. But how do they protect themselves? Suppose that we'd like to protect our own header files (such as funcs.h) in the same sort of way. How would we do it?
Here's the usual trick. We rewrite funcs.h like this:
#ifndef FUNCS_H #define FUNCS_H #include <stdio.h> extern int f1(int); extern double f2(int, double); extern int f3(int, FILE *); #endif
All we've done is added the #ifndef and #define lines at the top, and the #ifndef line at the bottom. (The macro name FUNCS_H doesn't really mean anything, it's just one we don't and won't use anywhere else, so we use the convention of having its name mimic the name of the header file we're protecting.) Now, here's what happens: the first time the compiler processes funcs.h, it comes across the line
#ifndef FUNCS_H
and FUNCS_H is not defined, so it proceeds. The very next thing it does is #defines the macro FUNCS_H (with a replacement text of nothing, but that's okay, because we're never going to expand FUNCS_H, just test whether it's defined or not). Then it processes the rest of funcs.h, as usual. But, if that same run of the compiler ever comes across funcs.h for a second time, when it comes to the first #ifndef FUNCS_H line again, FUNCS_H will at that point be defined, so the preprocessor will skip down to the #endif line, which will skip the whole header file. Nothing in the file will be processed a second time.
(You might wonder what would tend to go wrong if a header file were processed multiple times. It's okay to issue multiple external declarations for the same function or global variable, as long as they're all consistent, so those wouldn't cause any problems. And the preprocessor also isn't supposed to complain if you #define a macro which is already defined, as long as it has the same value, that is, the same replacement text. But the compiler will complain if you try to define a structure type you've already defined, or a typedef you've already defined (see section 18.1.6), etc. So the protection against multiple inclusion is important in the general case.)
When header files are protected against multiple inclusion by the #ifndef trick, then header files can include other files to get the declarations and definitions they need, and no errors will arise because one file forgot to (or didn't know that it had to) include one header before another, and no multiple-definition errors will arise because of multiple inclusion. I recommend this technique.
In closing, though, I might mention that this technique is somewhat controversial. When header files include other header files, it can be hard to track down the chain of who includes what and who defines what, if for some reason you need to know. Therefore, some style guides disallow nested header files. (I don't know how these style guides recommend that you address the issue of having to require that certain files be included before others.)
Read sequentially: prev next top