Character Strings in C: Character Operations

Character variables and constants are frequently used in relational and arithmetic expres- sions. To properly use characters in such situations, it is necessary for you to understand how they are handled by the C compiler.

Whenever a character constant or variable is used in an expression in C, it is automat- ically converted to, and subsequently treated as, an integer value.

In Chapter 6, “Making Decisions,” you saw how the expression

c >= ‘a’ && c <= ‘z’

could be used to determine if the character variable c contained a lowercase letter. As mentioned there, such an expression could be used on systems that used an ASCII char- acter representation because the lowercase letters are represented sequentially in ASCII, with no other characters in-between. The first part of the preceding expression, which compares the value of c against the value of the character constant ‘a’, is actually com- paring the value of c against the internal representation of the character ‘a’. In ASCII, the character ‘a’ has the value 97, the character ‘b’ has the value 98, and so on. Therefore, the expression c >= ‘a’ is TRUE (nonzero) for any lowercase character con- tained in c because it has a value that is greater than or equal to 97. However, because there are characters other than the lowercase letters whose ASCII values are greater than 97 (such as the open and close braces), the test must be bounded on the other end to ensure that the result of the expression is TRUE for lowercase characters only. For this reason, c is compared against the character ‘z’, which, in ASCII, has the value 122.

Because comparing the value of c against the characters ‘a’ and ‘z’ in the preceding expression actually compares c to the numerical representations of ‘a’ and ‘z’, the expression

c >= 97 && c <= 122

could be equivalently used to determine if c is a lowercase letter. The first expression is preferred, however, because it does not require the knowledge of the specific numerical values of the characters ‘a’ and ‘z’, and because its intentions are less obscure.

The printf call

printf (“%i\n”, c);

can be used to print out the value that is used to internally represent the character stored inside c. If your system uses ASCII, the statement

printf (“%i\n”, ‘a’);

displays 97, for example.

Try to predict what the following two statements would produce:

c = ‘a’ + 1;

printf (“%c\n”, c);

Because the value of ‘a’ is 97 in ASCII, the effect of the first statement is to assign the value 98 to the character variable c. Because this value represents the character ‘b’ in ASCII, this is the character that is displayed by the printf call.

Although adding one to a character constant hardly seems practical, the preceding example gives way to an important technique that is used to convert the characters ‘0’ through ‘9’ into their corresponding numerical values 0 through 9. Recall that the character ‘0’ is not the same as the integer 0, the character ‘1’ is not the same as the integer 1, and so on. In fact, the character ‘0’ has the numerical value 48 in ASCII, which is what is displayed by the following printf call:

printf (“%i\n”, ‘0’);

Suppose the character variable c contains one of the characters ‘0’ through ‘9’ and that you want to convert this value into the corresponding integer 0 through 9. Because the digits of virtually all character sets are represented by sequential integer values, you can easily convert c into its integer equivalent by subtracting the character constant ‘0’ from it. Therefore, if i is defined as an integer variable, the statement

i = c – ‘0’;

has the effect of converting the character digit contained in c into its equivalent integer value. Suppose c contained the character ‘5’, which, in ASCII, is the number 53. The ASCII value of ‘0’ is 48, so execution of the preceding statement results in the integer subtraction of 48 from 53, which results in the integer value 5 being assigned to i. On a machine that uses a character set other than ASCII, the same result would most likely be obtained, even though the internal representations of ‘5’ and ‘0’ might differ.

The preceding technique can be extended to convert a character string consisting of digits into its equivalent numerical representation. This has been done in Program 10.11 in which a function called strToInt is presented to convert the character string passed as its argument into an integer value. The function ends its scan of the character string after a nondigit character is encountered and returns the result back to the calling rou-tine. It is assumed that an int variable is large enough to hold the value of the converted number.

Program 10.11 Converting a String to its Integer Equivalent

// Function to convert a string to an integer

#include <stdio.h>

int strToInt (const char string[])

{

int i, intValue, result = 0;

for ( i = 0; string[i] >= ‘0’ && string[i] <= ‘9’; ++i )

{

intValue = string[i] – ‘0’;

result = result * 10 + intValue;

}

return result;

}

int main (void)

{

int strToInt (const char string[]);

printf (“%i\n”, strToInt(“245”));

printf (“%i\n”, strToInt(“100”) + 25);

printf (“%i\n”, strToInt(“13×5”));

return 0;

}

Program 10.11 Output

245

125

The for loop is executed as long as the character contained in string[i] is a digit character. Each time through the loop, the character contained in string[i] is convert- ed into its equivalent integer value and is then added into the value of result multiplied by 10. To see how this technique works, consider execution of this loop when the func- tion is called with the character string “245” as an argument: The first time through the loop, intValue is assigned the value of string[0] – ‘0’. Because string[0] contains the character ‘2’, this results in the value 2 being assigned to intValue. Because the value of result is 0 the first time through the loop, multiplying it by 10 produces 0, which is added to intValue and stored back in result. So, by the end of the first pass through the loop, result contains the value 2.

The second time through the loop, intValue is set equal to 4, as calculated by sub- tracting ‘0’ from ‘4’. Multiplying result by 10 produces 20, which is added to the value of intValue, producing 24 as the value stored in result.

The third time through the loop, intValue is equal to ‘5’ – ‘0’, or 5, which is added into the value of result multiplied by 10 (240). Thus, the value 245 is the value of result after the loop has been executed for the third time.

Upon encountering the terminating null character, the for loop is exited and the value of result, 245, is returned to the calling routine.

The strToInt function could be improved in two ways. First, it doesn’t handle nega- tive numbers. Second, it doesn’t let you know whether the string contained any valid digit characters at all. For example, strToInt (“xxx”) returns 0. These improvements are left as an exercise.

This discussion concludes this chapter on character strings. As you can see, C provides capabilities that enable character strings to be efficiently and easily manipulated. The library actually contains a wide variety of library functions for performing operations on strings. For example, it offers the function strlen to calculate the length of a character string, strcmp to compare two strings, strcat to concatenate two strings, strcpy to copy one string to another, atoi to convert a string to an integer, and isupper, islower, isalpha, and isdigit to test whether a character is uppercase, lowercase, alphabetic, or a digit. A good exercise is to rewrite the examples from this chapter to make use of these routines. Consult Appendix B, “The Standard C Library,” which lists many of the functions available from the library.

Source: Kochan Stephen G. (2004), Programming in C: A Complete Introduction to the C Programming Language, Sams; Subsequent edition.

Leave a Reply Cancel reply

Login