Chapter 18: Miscellaneous C Features

This chapter goes back and fills in details on several C language features which this series of notes (and perhaps other introductory works) has downplayed or omitted.

18.1: Types

18.2: More Operators

18.3: More Statements


Read sequentially: prev next top

18.1: Types

So far, we've seen the basic types char, int, long int, float, and double. This section introduces the last few basic types: void, short int, long double, and the unsigned types. Also, we'll meet storage classes, typedef, and the type qualifiers const and volatile.

18.1.1: void

18.1.2: short int

18.1.3: unsigned integers

18.1.4: long double

18.1.5: Storage Classes

18.1.6 Type Definitions (typedef)

18.1.7: Type Qualifiers


18.1.1: void

From time to time we have seen the keyword void lurking in various programs and code samples. void is sort of a "placeholder" type, used in circumstances where you need a type name but there's not really any one right type to use. Formally, we can say that void is a type with no values.

There are three main uses of the void type:

  1. As the return type of a function which does not return a value. A function declared as
    	void f()
    

    is declared as "void" or "returning void" which actually means that it returns no value. The compiler will not complain if you "fall off the end" of a void function without executing a return statement; the compiler will complain if you execute a return statement that specifies a value to be returned. (As far as the low-level syntax of the return statement is concerned, the expression is optional; but the expression is required in functions that return values and disallowed in void functions.)

  2. As the argument list in the prototype of a function that accepts no parameters. In a function definition such as
    	int f()
    	{
    	...
    	}
    

    the empty parentheses indicate that the function accepts no parameters. But (for historical reasons) in an external function prototype declaration such as

    	extern int f();
    

    the empty parentheses indicate that we don't know how many (or what type of) parameters the function accepts. In either case, we can make the fact that the function accepts zero parameters explicit by using the keyword void in the parameter list:

    	extern int f(void);
    
    	int f(void)
    	{
    	...
    	}
    

    For obvious reasons, if void appears in a parameter list, it must be the first and only parameter, and it must not declare an argument name. (That is, prototypes like int f(int, void) and int f(void x) are meaningless and illegal.)

  3. As a pointer type, to indicate a "generic pointer" which might in fact point to any type. We need "generic pointers" when we're using functions like malloc. malloc returns a pointer to n bytes of memory, which we may use as any type of pointer we wish. Normally, it is an error to use a value of one pointer type where another pointer type is required. For example, the fragments
    	int i;
    	double *dp = &i;	/* WRONG */
    

    and

    	int *ip = dp;		/* WRONG */
    

    would both generate errors, because you can't assign back and forth between int pointers and double pointers. However, the type void * ("pointer to void") is special: it is legal to assign a value of type pointer-to-void to a variable of some other pointer type, and vice versa. (In case the pointer types have different sizes or representations, the compiler will automatically perform conversions. We'll say more about type conversions, including pointer type conversions, in a later section.) So the lines

    	#include <stdlib.h>
    
    	char *cp = malloc(10);
    	int *ip = malloc(sizeof(int));
    	double *dp = malloc(sizeof(double));
    

    are all legal, since <stdlib.h> declares malloc as returning void *, indicating that an assignment of malloc's return value to any pointer type is permissible.


18.1.2: short int

Another type we haven't met is short int. A short int has the same guarantees as a plain int: it will hold integers in at least the range +-32,767. The difference between short int and plain int is that short int might be smaller. Remember, the definitions of both these types (like all C types) is that they have at least the specified range. On some machines, plain int will hold numbers greater than 32,767. (On 32-bit machines, for example, it's common for plain int to be 32 bits, and to hold +-2,147,483,647. Yes, this is all the way up to the minimum range for a long int.) You might use a short int when you had a lot of them and were worried about saving memory. If you had a large array of integers all less than 32,768, or a large number of structures with one or more members holding integers all less than 32,768, you might declare the array or the structure members as short int, to avoid devoting 4 bytes to each of them on 32-bit machines.


18.1.3: unsigned integers

For each of the integral types, there is a corresponding unsigned type. Thus, we have unsigned char, unsigned short int, unsigned int, and unsigned long int. There are two differences between the unsigned types and the default, signed types:

  1. They do not hold negative numbers; their range is from 0 up to some maximum.
  2. They have guaranteed properties on overflow. If the range of unsigned int on some machine is 0-65,535, and if an unsigned int variable contains the value 65,535, then adding 1 to it will wrap around to 0. Whenever a calculation involving unsigned integers overflows, or tries to go negative, the result is the remainder that would be obtained if the true (mathematical) result were divided by the range of the unsigned type. In other words, if UINT_MAX is the largest value that will fit in an unsigned int (that is, if the range is 0-UINT_MAX), then the results of calculations that would overflow, such as
    	65535 + 1
    and
    	5 - 10
    

    are actually

    	(65535 + 1) % (UINT_MAX+1)
    and
    	(5 - 10) % (UINT_MAX+1)
    

The guaranteed minimum ranges of the unsigned types are:

	unsigned char	0 - 255
	unsigned short int	0 - 65535
	unsigned int	0 - 65535
	unsigned long int	0 - 4294967295

These multiword type names can also be abbreviated. Instead of writing long int, you can write long. Instead of writing short int, you can write short. Instead of writing unsigned int, you can write unsigned. Instead of writing unsigned long int, you can write unsigned long. Instead of writing unsigned short int, you can write unsigned short.

In the absence of the unsigned keyword, types short int, int, and long int are all signed. However, depending on the particular compiler you're using, plain type char might be signed or unsigned. If you need an explicitly signed character type, you can use signed char.


18.1.4: long double

The two common floating-point types in C are float and double. We haven't said much about the differences between them, because there isn't much to say: double generally gives you more precision (more digits' worth of significance), and perhaps more range (an ability to use numbers with larger exponents) than float. Continuing this progression, ANSI C added a third floating-point type, long double, which may give you even more range or even more precision. If you're using a machine with an extended-precision floating-point format, long double will let you access that format. But if your machine has only two floating-point formats, float and double will probably map to those, and long double won't end up being any better than plain double.

The printf and scanf formats for long double are %Le, %Lf, and %Lg.


18.1.5: Storage Classes

A full-blown declaration in C consists of several parts: the storage class, the base type, any type qualifiers, and a list of declarators, where each declarator consists of: an identifier, additional characters possibly indicating that it is a pointer, array, or function, and finally an optional initializer. We've met each of these parts at various points along the way, although we have never explicitly mentioned or defined the storage class. The storage class is optional; it generally appears at the beginning of the declaration (before the base type) if it appears at all. At most one storage class may appear in any one declaration.

We've seen two storage classes so far, extern and static. extern marks a declaration as an external declaration, indicating that the identifier declared has its defining instance somewhere else (where "somewhere else" might be somewhere else in the same source file, or in a different source file). static is used in two different ways, (1) to indicate that a global ("file scope") variable is private to one source file, and cannot be accessed (even with external declarations) from other source files, or (2) to indicate that a local variable should have static duration, such that it does not come and go as the function is called and returns, and so that its value persists between invocations of the function.

Besides these two, there are three other storage classes. register indicates that the programmer believes that the variable will be heavily used, and that it should be assigned to a high-speed CPU register (rather than an ordinary memory location) if possible. Explicit register declarations are rare these days, because modern compilers generally do an excellent job, all by themselves and without any hints, of deciding which variables belong in machine registers. A limitation of register variables is that you cannot generate pointers to them using the & operator. (This is because, on most machines, pointers are implemented as memory addresses, and CPU registers usually do not have memory addresses.)

The fourth storage class is auto, which indicates that a local variable should have automatic duration. (Automatic duration, remember, means that variables are automatically allocated when a function is called and automatically deallocated when it returns.) Since automatic duration is the default for local variables anyway, auto is virtually never used; it's a relic from C's past.

The fifth storage class, typedef, is described in the next section.


18.1.6 Type Definitions (typedef)

[This section corresponds to K&R Sec. 6.7]

When the storage class is typedef, as in

	typedef int count;

a declaration means something completely different than it usually does. Instead of declaring a variable named count, we are declaring a new type named count. (Actually, we're just declaring a new name for an old type; you can think of typedef names as type aliases.) Having declared this new type count, we can now use it as a base type in other declarations, such as

	count napples, noranges;

The types FILE and size_t which we've seen at various points along the way (which we've described as being "new types defined by the header file <stdio.h>" [or also several other headers in the case of size_t]) are typically both defined using typedef.

You can use typedef to define new names for complicated types, too. You could define

	typedef char *string;
	typedef struct listnode list, *nodeptr;

after which you can declare several strings (char *'s) by saying

	string string1, string2;

or several lists (struct listnode) by saying

	list list1, list2;

or several pointers to list nodes by saying

	nodeptr np1, np2;

typedef provides a way to simplify structure declarations. In a previous section, we saw that we had to declare new variables of type struct complex using the syntax

	struct complex c1, c2;

Using typedef, however, we can introduce a single-word complex type, after all:

	typedef struct complex complextype;
	complextype c1, c2;

It's also possible to define a structure and a typedef tag for it at the same time:

	typedef struct complex
		{
		double real;
		double imag;
		} complextype;

Furthermore, when using typedef names, you may not need the structure tag at all; you can also write

	typedef struct
		{
		double real;
		double imag;
		} complextype;

(At this point, of course, you culd use the cleaner name "complex" for the typedef, instead of "complextype". Actually, it turns out that you could have done this all along. Structure tags and typedef names share separate namespaces, so the declaration

	typedef struct complex
		{
		double real;
		double imag;
		} complex;

is legal, though possibly confusing.)

Defining new type names is done mostly for convenience, or to make the code more self-documenting, or to make it possible to change the actual base type used for a lot of variables without rewriting the declarations of all those variables.

A typedef declaration is a little bit like a preprocessor #define directive. We could imagine writing

	#define count int
	#define string char *

in an attempt to accomplish the same thing. This won't work nearly as well, however: given the macro definition, the line

	string string1, string2;

would expand to

	char * string1, string2;

which would declare string1 as a char * but string2 as a plain char. The typedef declaration, however, would work correctly.

Some programmers capitalize typedef names to make them stand out a little better, and others use the convention of ending all typedef names with the characters "_t".


18.1.7: Type Qualifiers

[Type qualifiers are a fairly advanced feature which not all programs need. You may skip this section.]

Any type can be qualified by the type qualifiers const or volatile. Both of these were new with ANSI C, and there is a lot of older code which does not use them. Even in new code, you will see const fairly rarely, and volatile even less often.

In simple declarations, the type qualifier is simply another keyword in the type name, along with the basic type and the storage class. For example,

	const int i;
	const float f;
	extern volatile unsigned long int ul;

are all declarations involving type qualifiers.

A const value is one you promise not to modify. The compiler may therefore be able to make certain optimizations, such as placing a const-qualified variable in read-only memory. However, a const-qualified variable is not a true constant; that is, it does not qualify as a constant expression which C requires in certain situations, such as array dimensions, case labels (see section 18.3.1 below), and initializers for variables with static duration (globals and static locals).

A volatile value is one that might change unexpectedly. This situation generally only arises when you're directly accessing special hardware registers, usually when writing device drivers. The compiler should not assume that a volatile-qualified variable contains the last value that was written to it, or that reading it again would yield the same result that reading it last time did. The compiler should therefore avoid making any optimizations which would suppress seemingly-redundant accesses to a volatile-qualified variable. Examples of volatile locations would be a clock register (which always gave an up-to-date time value each time you read it), or a device control/status register, which caused some peripheral device to perform an action each time the register was written to.

Type qualifiers become more interesting (or at least more complicated or confusing) when they modify pointer types. The placement of the qualifier in a pointer declaration determines whether it is the pointer itself, or the location pointed to, that is qualified. The declarations

	int const *ci1;
and
	const int *ci2;

declare pointers to constant ints, which means that although the pointers can be modified (to point to different locations), the locations pointed to (that is, *ci1 and *ci2) can not be modified. The declaration

	int * const cp;

on the other hand, declares a pointer which cannot be modified (it cannot be set to point anywhere else), although the value it points to (*cp) can be modified.

Pointers to constants (such as ci1 and ci2 above) have a particularly important use: they can be used to document (and enforce) pointer parameters which a function promises not to use to modify locations in the caller.

Normally, C uses pass-by-value. A function receives copies of its arguments, which means that it cannot modify any variables in the caller (since copies of those variables were passed). If a function receives a pointer, however (including the pointer that results when the caller seems to "pass" an array), it can use that pointer (more precisely, it can use its copy of the pointer) to modify locations in the caller. Sometimes, this is just what is desired: when the caller "passes" an array which it wishes the function to fill in, or when the function wants to return one or more values via pointers rather than as the conventional return value, the function's modification of locations in the caller is deliberate and understood by the caller. However, when a function receives a pointer argument for some other reason, under circumstances in which the caller might not want the function to use the pointer to modify anything in the caller, the caller might appreciate a guarantee that the pointer (within the function) won't be used to modify anything. To make that guarantee, the function can declare the pointer as pointer-to-const.

For example, our old friend printf never scribbles on the string it's given as its format argument; it merely uses it to decide what to print. Therefore, the prototype for printf is

	int printf(const char *fmt, ...)

where the ... represents printf's optional arguments. If a caller writes something like

	char mystring[] = "Hello, world!\n";
	printf(mystring);

it knows, from printf's prototype, that printf won't be scribbling on mystring. Furthermore, with that prototype for printf in scope, the actual author of the printf code couldn't accidentally write a (buggy) version which inadvertently modified the format argument--since it's declared as const char *, the compiler will complain if any attempt is made to write to the location(s) it points to.

const and volatile can also be used in combination. Theoretically, it's possible to have a single variable which is both:

	const volatile int x;

Also, both a pointer and what it points to can be qualified:

	const char * const cpc;

Finally, as in several other situations, C tends to assume type int, so if you want to save a bit of typing, you can write

	const i;

instead of

	const int i;

etc.


18.2: More Operators

The operators we haven't met include the bitwise operators, the cast operators, the comma operator, and the conditional (or "ternary") operator.

18.2.1: Bitwise Operators

18.2.2: Cast Operators

18.2.3: Default Type Promotions and Conversions

18.2.4: The Comma Operator

18.2.5: The Conditional Operator


18.2.1: Bitwise Operators

[This section corresponds to K&R Sec. 2.9]

The bitwise operators operate on numbers (always integers) as if they were sequences of binary bits (which, of course, internally to the computer they are). These operators will make the most sense, therefore, if we consider integers as represented in binary, octal, or hexadecimal (bases 2, 8, or 16), not decimal (base 10). Remember, you can use octal constants in C by prefixing them with an extra 0 (zero), and you can use hexadecimal constants by prefixing them with 0x (or 0X).

The & operator performs a bitwise AND on two integers. Each bit in the result is 1 only if both corresponding bits in the two input operands are 1. For example, 0x56 & 0x32 is 0x12, because (in binary):

	  0 1 0 1 0 1 1 0
	& 0 0 1 1 0 0 1 0
	  ---------------
	  0 0 0 1 0 0 1 0

The | (vertical bar) operator performs a bitwise OR on two integers. Each bit in the result is 1 if either of the corresponding bits in the two input operands is 1. For example, 0x56 | 0x32 is 0x76, because:

	  0 1 0 1 0 1 1 0
	| 0 0 1 1 0 0 1 0
	  ---------------
	  0 1 1 1 0 1 1 0

The ^ (caret) operator performs a bitwise exclusive-OR on two integers. Each bit in the result is 1 if one, but not both, of the corresponding bits in the two input operands is 1. For example, 0x56 ^ 0x32 is 0x64:

	  0 1 0 1 0 1 1 0
	^ 0 0 1 1 0 0 1 0
	  ---------------
	  0 1 1 0 0 1 0 0

The ~ (tilde) operator performs a bitwise complement on its single integer operand. (The ~ operator is therefore a unary operator, like ! and the unary -, &, and * operators.) Complementing a number means to change all the 0 bits to 1 and all the 1s to 0s. For example, assuming 16-bit integers, ~0x56 is 0xffa9:

	~ 0 0 0 0 0 0 0 0 0 1 0 1 0 1 1 0
	  -------------------------------
	  1 1 1 1 1 1 1 1 1 0 1 0 1 0 0 1

The << operator shifts its first operand left by a number of bits given by its second operand, filling in new 0 bits at the right. Similarly, the >> operator shifts its first operand right. If the first operand is unsigned, >> fills in 0 bits from the left, but if the first operand is signed, >> might fill in 1 bits if the high-order bit was already 1. (Uncertainty like this is one reason why it's usually a good idea to use all unsigned operands when working with the bitwise operators.) For example, 0x56 << 2 is 0x258:

	      0 1 0 1 0 1 1 0 << 2
	  -------------------
	  0 1 0 1 0 1 1 0 0 0

And 0x56 >> 1 is 0x3b:

	  0 1 0 1 0 1 1 0 >> 1
	  ---------------
	    0 1 0 1 0 1 1

For both of the shift operators, bits that scroll "off the end" are discarded; they don't wrap around. (Therefore, 0x56 >> 3 is 0x0a.)

The bitwise operators will make more sense if we take a look at some of the ways they're typically used. We can use & to test if a certain bit is 1 or not. For example, 0x56 & 0x40 is 0x40, but 0x32 & 0x40 is 0x00:

	  0 1 0 1 0 1 1 0	  0 0 1 1 0 0 1 0
	& 0 1 0 0 0 0 0 0	& 0 1 0 0 0 0 0 0
	  ---------------	  ---------------
	  0 1 0 0 0 0 0 0	  0 0 0 0 0 0 0 0

Since any nonzero result is considered "true" in C, we can use an expression involving & directly to test some condition, for example:

	if(x & 0x04)
		do something ;

(If we didn't like testing against the bitwise result, we could equivalently say if((x & 0x04) != 0) . The extra parentheses are important, as we'll explain below.)

Notice that the value 0x40 has exactly one 1 bit in its binary representation, which makes it useful for testing for the presence of a certain bit. Such a value is often called a bit mask. Often, we'll define a series of bit masks, all targeting different bits, and then treat a single integer value as a set of flags. A "flag" is an on-off, yes-no condition, so we only need one bit to record it, not the 16 or 32 bits (or more) of an int. Storing a set of flags in a single int does more than just save space, it also makes it convenient to assign a set of flags all at once from one flag variable to another, using the conventional assignment operator =. For example, if we made these definitions:

	#define DIRTY	0x01
	#define OPEN	0x02
	#define VERBOSE	0x04
	#define RED	0x08
	#define SEASICK	0x10

we would have set up 5 different bits as keeping track of those 5 different conditions ("dirty," "open," etc.). If we had a variable

	unsigned int flags;

which contained a set of these flags, we could write tests like

	if(flags & DIRTY)
		{ /* code for dirty case */ }

	if(!(flags & OPEN))
		{ /* code for closed case */ }

	if(flags & VERBOSE)
		{ /* code for verbose case */ }
	else	{ /* code for quiet case */ }

A condition like if(flags & DIRTY) can be read as "if the DIRTY bit is on".

These bitmasks would also be useful for setting the flags. To "turn on the DIRTY bit," we'd say

	flags = flags | DIRTY;		/* set DIRTY bit */

How would we "turn off" a bit? The way to do it is to leave on every bit but the one we're turning off, if they were on already. We do this with the & and ~ operators:

	flags = flags & ~DIRTY;		/* clear DIRTY bit */

This may be easier to see if we look at it in binary. If the DIRTY, RED, and SEASICK bits were already on, flags would be 0x19, and we'd have

	  0 0 0 1 1 0 0 1
	& 1 1 1 1 1 1 1 0
	  ---------------
	  0 0 0 1 1 0 0 0

As you can see, both the | operator when turning bits on and the & (plus ~) operator when turning bits off have no effect if the targeted bit were already on or off, respectively.

The definition of the exclusive-OR operator means that you can use it to toggle a bit, that is, to turn it to 1 if it was 0 and to 0 if it was one:

	flags = flags ^ VERBOSE;	/* toggle VERBOSE bit */

It's common to use the "op=" shorthand forms when doing all of these operations:

	flags |= DIRTY;			/* set DIRTY bit */
	flags &= ~OPEN;			/* clear OPEN bit */
	flags ^= VERBOSE;		/* toggle VERBOSE bit */

We can also use the bitwise operators to extract subsets of bits from the middle of an integer. For example, to extract the second-to-last hexadecimal "digit," we could use

	(i & 0xf0) >> 4

If i was 0x56, we have:

	     i		  0 1 0 1 0 1 1 0
	& 0x56		& 1 1 1 1 0 0 0 0
			  ---------------
			  0 1 0 1 0 0 0 0

and shifting this result right by 4 bits gives us 0 1 0 1, or 5, as we wished. Replacing (or overwriting) a subset of bits is a bit more complicated; we must first use & and ~ to clear all of the destination bits, then use << and | to "OR in" the new bits. For example, to replace that second-to-last hexadecimal digit with some new bits, we might use:

	(i & ~0xf0) | (newbits << 4)

If i was still 0x56 and newbits was 6, this would give us

	      i			  0 1 0 1 0 1 1 0
	& ~0xf0			& 0 0 0 0 1 1 1 1
				  ---------------
				  0 0 0 0 0 1 1 0
	| (newbits << 4)	| 0 1 1 0 0 0 0 0
				  ---------------
				  0 1 1 0 0 1 1 0

resulting in 0x66, as desired.

We've been using extra parentheses in several of these bitwise expressions because it turns out that (for the usual, hoary sort of "historical reasons") the precedence of the bitwise &, |, and ^ operators is low, usually lower than we'd want. (The reason that they're low is that, once upon a time, C didn't have the logical operators && and ||, and the bitwise operators & and | did double duty.) However, since the precedence of & and | (and ^) is lower than ==, !=, <<, and >>, expressions like

	if(a & 0x04 != 0)	/* WRONG */

and

	i & 0xf0 >> 4		/* WRONG */

would not work as desired; these last two would be equivalent to

	if(a & (0x04 != 0))
	i & (0xf0 >> 4)

and would not do the bit test or subset extraction that we wanted.

[The rest of this section is somewhat advanced and may be skipped.]

Because of the nature of base-2 arithmetic, it turns out that shifting left and shifting right are equivalent to multiplying and dividing by two. These operations are equivalent for the same reason that tacking zeroes on to the right of a number in base 10 is the same as multiplying by 10, and deleting digits from the right is the same as dividing by 10. You can convince yourself that 0x56 << 2 is the same as 0x56 * 4, and that 0x56 >> 1 is the same as 0x56 / 2. It's also the case that masking off all but the low-order bits is the same as taking a remainder; for example, 0x56 & 0x07 is the same as 0x56 % 8. Some programmers therefore use <<, >>, and & in preference to *, /, and % when powers of two are involved, on the grounds that the bitwise operators are "more efficient." Usually it isn't worth worrying about this, though, because most compilers are smart enough to perform these optimizations anyway (that is, if you write x * 4, the compiler might generate a left shift instruction all by itself), they're not always as readable, and they're not always correct for negative numbers.

The issue of negative numbers, by the way, explains why the right-shift operator >> is not precisely defined when the high-order bit of the value being shifted is 1. For signed values, if the high-order bit is a 1, the number is negative. (This is true for 1's complement, 2's complement, and sign-magnitude representations.) If you were using a right shift to implement division, you'd want a negative number to stay negative, so on some computers, under some compilers, when you shift a signed value right and the high-order bit is 1, new 1 bits are shifted in at the left instead of 0s. However, you can't depend on this, because not all computers and compilers implement right shift this way. In any case, shifting negative numbers to the right (even if the high-order 1 bit propagates) gives you an incorrect answer if there's a remainder involved: in 2's complement, 16-bit arithmetic, -15 is 0xfff1, so -15 >> 1 might give you 0xfff8shifted which is -8. But integer division is supposed to discard the remainder, so -15 / 2 would have given you -7. (If you're having trouble seeing the way the shift worked, 0xfff1 is 1111111111110001<sub>2</sub> and 0xfff8 is 1111111111111000<sub>2</sub>. The low-order 1 bit got shifted off to the right, but because the high-order bit was 1, a 1 got shifted in at the left.)


18.2.2: Cast Operators

[This section corresponds to the second half of K&R Sec. 2.7]

Most of the time, C performs conversions between related types automatically. (See section 18.2.3 for the complete story.) When you assign an int value to a float variable or vice versa, or perform calculations involving mixtures of arithmetic types, the types are converted automatically, as necessary. C even performs some pointer conversions automatically: malloc returns type void * (pointer-to-void), but a void * is automatically converted to whatever pointer type you assign (say) malloc's return value to.

Occasionally, you need to request a type conversion explicitly. Consider the code

	int i = 1, j = 2;
	float f;
	f = i / j;

Recall that the division operator / results in an integer division, discarding the remainder, when both operands are integral. It performs a floating-point division, yielding a possibly fractional result, when one or both operands have floating-point types. What happens here? Both operands are int, but the result of the division is assigned to a float, which would be able to hold a fractional result. Is the compiler smart enough to notice, and perform a floating-point division? No, it is not. The rule is, "if both operands are integral, division is integer division and discards any remainder", and this is the rule the compiler follows. In this case, then, we must manually and explicitly force one of the operands to be of floating-point type.

Explicit type conversions are requested in C using a cast operator. (The name of the operator comes from the term typecast; "typecasting" is another term for explicit type conversion, and some languages have "typecast operators." Yet another term for type conversion is coercion.) A cast operator consists of a type name, in parentheses. One way to fix the example above would be to rewrite it as

	f = (float)i / j;

The construction (float)i involves a cast; it says, "take i's value, and convert it to a float." (The only thing being converted is the value fetched from i; we're not changing i's type or anything.) Now, one operand of the / operator is floating-point, so we perform a floating-point division, and f receives the value 0.5.

Equivalently, we could write

	f = i / (float)j;

or

	f = (float)i / (float)j;

It's sufficient to use a cast on one of the operands, but it certainly doesn't hurt to cast both.

A similar situation is

	int i = 32000, j = 32000;
	long int li;
	li = i + j;

An int is only guaranteed to hold values up to 32,767. Here, the result i + j is 64,000, which is not guaranteed to fit into an int. Even though the eventual destination is a long int, the compiler does not look ahead to see this. The addition is performed using int arithmetic, and it may overflow. Again, the solution is to use a cast to explicitly convert one of the operands to a long int:

	li = (long int)i + j;

Now, since one of the operands is a long int, the addition is performed using long int arithmetic, and does not overflow.

Cast operators do not have to involve simple types; they can also involve pointer or structure or more complicated types. Once upon a time, before the void * type had been invented, malloc returned a char *, which had to be converted to the type you were using. For example, one used to write things like

	int *iarray = (int *)malloc(100 * sizeof(int));

and

	struct list *lp = (struct list *)malloc(sizeof(struct list));

These casts are not necessary under an ANSI C compiler (because malloc returns void * which the compiler converts automatically), but you may still see them in older code.


18.2.3: Default Type Promotions and Conversions

[This section corresponds to the first half of K&R Sec. 2.7]

In many cases, C performs type conversions automatically when values of differing types participate in expressions. For most programming, you don't have to memorize these rules exactly, but it's good idea to have a general understanding of how they work, so that you won't be surprised by any of the default conversions, and so that you'll know to use explicit conversions (as described in the previous section) in those few cases where C would not perform a needed conversion automatically.

The default conversion rules serve two purposes. One is purely selfish on the compiler's part: it does not want to have to know how to generate code to add, say, a floating-point number to an integer. The compiler would much prefer if all operations operated on two values of the same type: two integers, two floating-point numbers, etc. (Indeed, few processors have an instruction for adding a floating-point number to an integer; most have instructions for adding two integers, or two floating-point numbers.) The other purpose for the default conversions is the programmer's convenience: the mentality that "the computer and the compiler are stupid, we programmers must specify everything in excruciating detail" can be carried too far, and it's reasonable to define the language such that certain conversions are performed implicitly and automatically by the compiler, when it's unambiguous and safe to do so.

The rules, then (which you can also find on page 44 of K&R2, or in section 6.2.1 of the newer ANSI/ISO C Standard) are approximately as follows:

  1. First, in most circumstances, values of type char and short int are converted to int right off the bat.
  2. If an operation involves two operands, and one of them is of type long double, the other one is converted to long double.
  3. If an operation involves two operands, and one of them is of type double, the other one is converted to double.
  4. If an operation involves two operands, and one of them is of type float, the other one is converted to float.
  5. If an operation involves two operands, and one of them is of type long int, the other one is converted to long int.
  6. If an operation involves both signed and unsigned integers, the situation is a bit more complicated. If the unsigned operand is smaller (perhaps we're operating on unsigned int and long int), such that the larger, signed type could represent all values of the smaller, unsigned type, then the unsigned value is converted to the larger, signed type, and the result has the larger, signed type. Otherwise (that is, if the signed type can not represent all values of the unsigned type), both values are converted to a common unsigned type, and the result has that unsigned type.
  7. Finally, when a value is assigned to a variable using the assignment operator, it is automatically converted to the type of the variable if (a) both the value and the variable have arithmetic type (that is, integer or floating point), or (b) both the value and the variable are pointers, and one or the other of them is of type void *.
(This is not a precise statement of these rules. If you need to understand a complicated type conversion situation perfectly, you may have to consult a more definitive reference. In particular, the first five of these rules are usually described as being applied in order, in the order 2, 3, 4, 1, 5. Rule 6 is especially complicated, and although it is intended to prevent surprises, it still manages to introduce some.)


18.2.4: The Comma Operator

Once in a while, you find yourself in a situation in which C expects a single expression, but you have two things you want to say. The most common (and in fact the only common) example is in a for loop, specifically the first and third controlling expressions. What if (for example) you want to have a loop in which i counts up from 0 to 10 at the same time that j is counting down from 10 to 0? You could manipulate i in the loop header and j "by hand":

	j = 10;
	for(i = 0; i < 10; i++)
		{
		... rest of loop ...
		j--;
		}

but here it's harder to see the parallel nature of i and j, and it also turns out that this won't work right if the loop contains a continue statement. (A continue would jump back to the top of the loop, and i would be incremented but j would not be decremented.) You could compute j in terms of i:

	for(i = 0; i < 10; i++)
		{
		j = 10 - i;
		... rest of loop ...
		}

but this also makes j needlessly subservient. The usual way to write this loop in C would be

	for(i = 0, j = 10; i < 10; i++, j--)
		{
		... rest of loop ...
		}

Here, the first (initialization) expression is

	i = 0, j = 10

The comma is the comma operator, which simply evaluates the first subexpression i = 0, then the second j = 10. The third controlling expression,

	i++, j--

also contains a comma operator, and again, performs first i++ and then j--.

Precisely stated, the meaning of the comma operator in the general expression

	e1 , e2

is "evaluate the subexpression e1, then evaluate e2; the value of the expression is the value of e2." Therefore, e1 had better involve an assignment or an increment ++ or decrement -- or function call or some other kind of side effect, because otherwise it would calculate a value which would be discarded.

There's hardly any reason to use a comma operator anywhere other than in the first and third controlling expressions of a for loop, and in fact most of the commas you see in C programs are not comma operators. In particular, the commas between the arguments in a function call are not comma operators; they are just punctuation which separate several argument expressions. It's pretty easy to see that they cannot be comma operators, otherwise in a call like

	printf("Hello, %s!\n", "world");

the action would be "evaluate the string "Hello, %s!\n", discard it, and pass only the string "world" to printf." This is of course not what we want; we expect both strings to be passed to printf as two separate arguments (which is, of course, what happens).


18.2.5: The Conditional Operator

[This section corresponds to K&R Sec. 2.11]

C has one last operator which we haven't seen yet. It's called the conditional or "ternary" or ?: operator, and in action it looks something like this:

	average = (n > 0) ? sum / n : 0

The syntax of the conditional operator is

	e1 ? e2 : e3

and what happens is that e1 is evaluated, and if it's true then e2 is evaluated and becomes the result of the expression, otherwise e3 is evaluated and becomes the result of the expression. In other words, the conditional expression is sort of an if/else statement buried inside of an expression. The above computation of average could be written out in a longer form using an if statement:

	if(n > 0)
		average = sum / n;
	else	average = 0;

The conditional operator, however, forms an expression and can therefore be used wherever an expression can be used. This makes it more convenient to use when an if statement would otherwise cause other sections of code to be needlessly repeated. For example, suppose we were trying to write a complicated function call

	func(a, b + 1, c + d, xx, (g + h + i) / 2);

where xx was supposed to be f if e was true and 0 if it was not. Using an if statement, we'd have to write:

	if(e)
		func(a, b + 1, c + d, f, (g + h + i) / 2);
	else	func(a, b + 1, c + d, 0, (g + h + i) / 2);

We could write this more compactly, more readably, and more safely (it's easier both to see and to guarantee that the other arguments are always the same) by writing

	func(a, b + 1, c + d, e ? f : 0, (g + h + i) / 2);

(The obscure name "ternary," by the way, comes from the fact that the conditional operator is neither unary nor binary; it takes three operands.)


18.3: More Statements

We'll round out this section by looking at three more statements: switch, do/while, and goto.

18.3.1: switch

18.3.2: do/while

18.3.3: goto


18.3.1: switch

[This section corresponds to K&R Sec. 3.4]

A frequent sort of pattern is exemplified by the sequence

	if(x == e1)
		/* some code */
	else if(x == e2)
		/* other code */
	else if(x == e3)
		/* some more code */
	else if(x == e4)
		/* yet more code */
	else
		/* default code */

Depending on the value of x, we have one of several chunks of code to execute, which we select with a long if/else/if/else... chain. When the value we're selecting on is an integer, and when the values we're selecting among are all constant, we can use a switch statement, instead. The switch statement evaluates an expression and matches the result against a series of "case labels". The code beginning with the matching case label (if any) is executed. A switch statement can also have a default case which is executed if none of the explicit cases match.

A switch statement looks like this:

	switch( expr )
		{
		case c1 :
			... code ...
			break;
		case c2 :
			... code ...
			break;
		case c3 :
			... code ...
			break;
		...
		default:
			... code ...
			break;
		}

The expression expr is evaluated. If one of the case labels (c1, c2, c3, etc., which must all be integral constants) matches, execution jumps to there, and continues until the next break statement. Otherwise, if there is a default label, execution jumps to there (and continues to the next break statement). Otherwise, none of the code in the switch statement is executed. (Yes, the break statement is also used to break out of loops. It breaks out of the nearest enclosing loop or switch statement it finds itself in.)

The switch statement only works on integral arguments and expressions (char, the various sizes of int, and enums, though we haven't met enums yet). There is no direct way to switch on strings, or on floating-point values. The target case labels must be specified explicitly; there is no general way to specify a case which corresponds to a range of values.

One peculiarity of the switch statement is that the break at the end of one case's block of code is optional. If you leave it out, control will "fall through" from one case to the next. Occasionally, this is what you want, but usually not, so remember to put a break statement after most cases. (Since falling through is so rare, many programmers highlight it, when they do mean to use it, with a comment like /* FALL THROUGH */, to indicate that it's not a mistake.) One way to make use of "fallthrough" is when you have a small set or range of cases which should all map to the same code. Since the case labels are just labels, and since there doesn't have to be a statement immediately following a case label, you can associate several case labels with one block of code:

	switch(x)
		{
		case 1:
			... code ...
			break;
		case 2:
			... code ...
			break;
		case 3:
		case 4:
		case 5:
			... code ...
			break;
		default:
			... code ...
			break;
		}

Here, the same chunk of code is executed when x is 3, 4, or 5.

The case labels do not have to be in any particular order; the compiler is smart enough to find the matching one if it's there. The default case doesn't have to go at the end, either.

It's common to switch on characters:

	switch(c)
		{
		case '+':
			/* code for + */
			break;
		case '-':
			/* code for - */
			break;
		case '\n':
			/* code for newline */
			/* FALL THROUGH */
		case ' ':
		case '\t':
			/* code for other whitespace */
			break;
		case '0': case '1': case '2': case '3': case '4':
		case '5': case '6': case '7': case '8': case '9':
			/* code for digits */
			break;
		default:
			/* code for all other characters */
			break;
		}

It's also common to have a set of #defined values, and to switch on those:

	#define APPLE    1
	#define ORANGE   2
	#define CHERRY   3
	#define BROCCOLI 4

	...

	switch(fruit)
		{
		case APPLE:
			printf("turnover"); break;
		case ORANGE:
			printf("marmalade"); break;
		case CHERRY:
			printf("pie"); break;
		case BROCCOLI:
			printf("wait a minute... that's not a fruit"); break;
		}


18.3.2: do/while

[This section corresponds to K&R Sec. 3.6]

Briefly stated, a do/while loop is like a while loop, except that the body of the loop is always executed at least once, even if the condition is initially false. We'll motivate the usefulness of this loop with a slightly long example.

We know that the digit character '1' is not the same as the int value 1, and that the string "123" is not the same as the int value 123. We've learned that the atoi function will convert a string (containing digits) to the corresponding integer, and that we can use the sprintf function to generate a string of digits corresponding to an integer. Now let's see how we could convert an integer to a string of digits by hand, if for some reason we couldn't use sprintf but had to do it ourselves.

If the number were less than 10 and not negative, it would be easy. Since we know that the digit characters '0' to '9' have consecutive character set values, the expression i + '0' gives the character corresponding to i's value if i is an integer between 0 and 9, inclusive. So our very first stab at an integer-to-string routine, which would only work for one-digit numbers, might look like this:

	char string[2];
	string[0] = i + '0';
	string[1] = '\0';

(Remember, the null character \0 is required to terminate strings in C.)

The limitation to single-digit numbers is obviously not acceptable. Suppose we went a little further, and arranged to handle numbers less than 100, by using an if statement to choose between the 1-digit case and the 2-digit case:

	char string[3];
	if(i < 10)
		{
		string[0] = i + '0';
		string[1] = '\0';
		}
	else	{
		string[0] = (i / 10) + '0';
		string[1] = (i % 10) + '0';
		string[2] = '\0';
		}

In the two-digit case, the subexpression i % 10 gives us the value of the low-order (1's) digit of the result, and i / 10 gives us the high-order (10's) digit.

We've still got a pretty limited piece of code, and if we kept extending it in this way, with explicit if statements depending on how many digits the number could have, we'd duplicate a lot of code and end up with quite a mess, and we wouldn't necessarily know how many cases we'll need (at least 5, because type int is guaranteed to hold integers up to at least 32,767, but on some systems it can hold more). The right solution to this problem, therefore, involves a loop.

One way of thinking about if statements and while loops is that an if statement allows you to select a chunk of code which, if required, will complete some step towards the accomplishment of an overall task, while a while loop selects a chunk of code that will whittle away at some task or subtask, but without necessarily completing it on the first go, such that several trips through the loop might be required. Since the operation i % 10 does give us one digit of our answer, but since we may end up having many digits, our next attempt is to wrap the i % 10 and i / 10 code up in a while loop:

	char string[25];
	int j = 24;
	string[j] = '\0';
	while(i > 0)
		{
		string[--j] = (i % 10) + '0';
		i /= 10;
		}

Here we use an auxiliary variable j to keep track of which element of the string array we're filling in. We fill in the array from the end back towards the beginning, because successive remainders when dividing i by 10 give us digits in the reverse order (the reverse, that is, of the order we'd write the digits left-to-right). In this clde, j holds the index of the element we've just filled in, so we use the predecrement form --j to decrement j before filling in the next digit. When we're done, string[j] will be the first (leftmost) digit of our result. (For the string array as declared, i had better have fewer than 25 digits, but this is a safe assumption even for 64-bit machines.)

The third try just above, using a while loop, will work just fine except in the case when i == 0. If i is 0, the controlling expression i > 0 of the while loop will immediately be false, and no trips through the loop will be taken. This means that the integer 0 will be converted to the empty string "", not the string "0". In this case, we would like to take one trip through the loop (to generate the digit 0) even though the condition is initially false.

For loops like these, C has the do/while loop, which tests the condition at the bottom of the loop, after making the first trip through without checking. The syntax of the do/while loop is

	do statement
	while( expression );

The statement is almost always a brace-enclosed block of statements, because a do/while loop without braces looks odd even if there's only one statement in the body. Notice that there is a semicolon after the close parenthesis after the controlling expression.

Using a do/while loop, we can write our final version of the integer-to-string converter:

	char string[25];
	int j = 24;
	string[j] = '\0';
	do	{
		string[--j] = (i % 10) + '0';
		i /= 10;
		} while(i > 0);

This version is now almost perfect; its only deficiency is that it has no provision for negative numbers.

C's do/while loop is analogous to the repeat/until loop in Pascal. (C's while loop, on the other hand, is like Pascal's while/do loop.)


18.3.3: goto

[This section corresponds to K&R Sec. 3.8]

Finally, C has a goto statement, and labels to go to. You will hear many people disparage goto statements, and without taking a stand on whether they are inherently evil, I will say that most code can be written, and written pretty cleanly, without them.

You can attach a label to any statement:

		statement1;
	label1:	statement2;
		statement3;
	label2:	statement4;
		statement5;

A label is simply an identifier followed by a colon. The names for labels follow the same rules as for variables and other identifiers (they consist of letters, digit characters, and underscores, generally beginning with a letter, in any case not beginning with a digit). A label can in principle have the same name as a variable; variables and labels are quite distinct, so the compiler can keep them separate. Label names must obviously be unique within a function.

Anywhere you want, you can say

	goto label ;

where label is one of the labels in the current function, and control flow will jump to that point. You can only jump around within the same function; you can't go to a label in some other function. (If goto isn't enough for you, if for some reason you must jump back to another function, there's a library function, longjmp, which will do so under certain circumstances.) If you jump into a brace-enclosed block of statements, and if that block has local, automatic variables which have initializers (that is, attached to their declarations), the initializers will not take effect. (In other words, initializers for block-local variables take effect only when the block is entered normally, from the top.)

For the vast majority of functions, the more "structured" if/else, switch, and loop statements, perhaps augmented with break and continue statements, will accomplish the required control flow, in a clean and obvious way. (Without practice, it may not always be immediately obvious how to structure a piece of code as, say, a clean loop, but the more important point is that once it's structured as a clean loop, it will be more obvious to a later reader what it's doing.) Remember, too, that function calls and return statements also accomplish control flow (analogous to the GOSUB in BASIC) in a clean and structured way. The complaint about goto is that when it's used without restraint, a tangled mess of unwieldy "spaghetti code" often results. There are really only two times you ever need to use a goto statement in real programs:

  1. To break out of several loops at once. (When you have nested loops, the break statement only breaks you out of the innermost loop it's in.)
  2. To jump to the end of a function, to perform some cleanup code or something, but bypassing the rest of the function. Usually, such a jump-to-the-end happens as a result of some error condition which the function has detected.

It's customary for an author to claim that, although goto exists, "it has not been used in this book", or that the author can count on the fingers of one hand the number of times he's ever used goto in C, and in fact I think I can make these claims myself.


Read sequentially: prev next top