Character Strings in C: Character Strings, Structures, and Arrays

You can combine the basic elements of the C programming language to form very pow- erful programming constructs in many ways. In Chapter 9, “Working with Structures,” for example, you saw how you could easily define an array of structures. Program 10.9 further illustrates the notion of arrays of structures, combined with the variable-length character string.

Suppose you want to write a computer program that acts like a dictionary. If you had such a program, you could use it whenever you came across a word whose meaning was not clear.You could type the word into the program, and the program could then auto- matically “look up” the word inside the dictionary and tell you its definition.

If you contemplate developing such a program, one of the first thoughts that comes to mind is the representation of the word and its definition inside the computer. Obviously, because the word and its definition are logically related, the notion of a struc- ture comes immediately to mind.You can define a structure called entry, for example, to hold the word and its definition:

struct entry

{

char word[15];

char definition[50];

};

In the preceding structure definition, you have defined enough space for a 14-letter word (remember, you are dealing with variable-length character strings, so you need to leave room for the null character) plus a 49-character definition. The following is an example of a variable defined to be of type struct entry that is initialized to contain the word “blob” and its definition.

struct entry word1 = { “blob”, “an amorphous mass” };

Because you want to provide for many words inside your dictionary, it seems logical to define an array of entry structures, such as in

struct entry dictionary[100];

which allows for a dictionary of 100 words. Obviously, this is far from sufficient if you are interested in setting up an English language dictionary, which requires at least 100,000 entries to be of any value. In that case, you would probably adopt a more sophisticated approach, one that would typically involve storing the dictionary on the computer’s disk, as opposed to storing its entire contents in memory.

Having defined the structure of your dictionary, you should now think a bit about its organization. Most dictionaries are organized alphabetically. It makes sense to organize yours the same way. For now, assume that this is because it makes the dictionary easier to read. Later, you see the real motivation for such an organization.

Now, it’s time to think about the development of the program. It is convenient to define a function to look up a word inside the dictionary. If the word is found, the func- tion could return the entry number of the word inside the dictionary; otherwise, the function could return –1 to indicate that the word was not found in the dictionary. So, a typical call to this function, which you can call lookup, might appear as follows:

entry = lookup (dictionary, word, entries);

In this case, the lookup function searches dictionary for the word as contained in the character string word. The third argument, entries, represents the number of entries in the dictionary. The function searches the dictionary for the specified word and returns the entry number in the dictionary if the word is found, or returns –1 if the word is not found.

In Program 10.9, the lookup function uses the equalStrings function defined in Program 10.4 to determine if the specified word matches an entry in the dictionary.

Program 10.9 Using the Dictionary Lookup Program

// Program to use the dictionary lookup program

#include <stdio.h>

#include <stdbool.h>

struct entry

{

char word[15];

char definition[50];

};

/***** Insert equalStrings function here *****/

// function to look up a word inside a dictionary

int lookup (const struct entry dictionary[], const char search[], const int entries)

{

int i;

bool equalStrings (const char s1[], const char s2[]);

for ( i = 0; i < entries; ++i )

if ( equalStrings (search, dictionary[i].word) )

return i;

return -1;

}

int main (void)

{

const struct entry dictionary[100] =

{ { “aardvark”, “a burrowing African mammal” },

{ “abyss”, “a bottomless pit” },

{ “acumen”, “mentally sharp; keen” },

{ “addle”, “to become confused” },

{ “aerie”, “a high nest” },

{ “affix”, “to append; attach” },

{ “agar”, “a jelly made from seaweed” },

{ “ahoy”, “a nautical call of greeting” },

{ “aigrette”, “an ornamental cluster of feathers” },

{ “ajar”, “partially opened” } };

char word[10];

int entries = 10;

int entry;

int lookup (const struct entry dictionary[], const char search[],

const int entries);

printf (“Enter word: “);

scanf (“%14s”, word);

entry = lookup (dictionary, word, entries);

if ( entry != -1 )

printf (“%s\n”, dictionary[entry].definition);

else

printf (“Sorry, the word %s is not in my dictionary.\n”, word);

return 0;

}

Program 10.9 Output

Enter word: agar

a jelly made from seaweed

Program 10.9 Output (Rerun)

Enter word: accede

Sorry, the word accede is not in my dictionary.

The lookup function sequences through each entry in the dictionary. For each such entry, the function calls the equalStrings function to determine if the character string search matches the word member of the particular dictionary entry. If it does match, the function returns the value of the variable i, which is the entry number of the word that was found in the dictionary. The function is exited immediately upon execution of the return statement, despite the fact that the function is in the middle of executing a for loop.

If the lookup function exhausts all the entries in the dictionary without finding a match, the return statement after the for loop is executed to return the “not found” indication (–1) back to the caller.

1. A Better Search Method

The method used by the lookup function to search for a particular word in the diction- ary is straightforward enough; the function simply performs a sequential search through all the entries in the dictionary until either a match is made or the end of the dictionary is reached. For a small-sized dictionary like the one in your program, this approach is perfectly fine. However, if you start dealing with large dictionaries containing hundreds or perhaps even thousands of entries, this approach might no longer be sufficient because of the time it takes to sequentially search through all of the entries. The time required can be considerable—even though considerable in this case could mean only a fraction of a second. One of the prime considerations that must be given to any sort of informa- tion retrieval program is that of speed. Because the searching process is one that is so frequently used in computer applications, much attention has been given by computer scientists to developing efficient algorithms for searching (about as much attention as has been given to the process of sorting).

You can make use of the fact that your dictionary is in alphabetical order to develop a more efficient lookup function. The first obvious optimization that comes to mind is in the case that the word you are looking for does not exist in the dictionary.You can make your lookup function “intelligent” enough to recognize when it has gone too far in its search. For example, if you look up the word “active” in the dictionary defined in Program 10.9, as soon as you reach the word “acumen,” you can conclude that “active” is not there because, if it was, it would have appeared in the dictionary before the word “acumen.”

As was mentioned, the preceding optimization strategy does help to reduce your search time somewhat, but only when a particular word is not present in the dictionary. What you are really looking for is an algorithm that reduces the search time in most cases, not just in one particular case. Such an algorithm exists under the name of the binary search.

The strategy behind the binary search is relatively simple to understand. To illustrate how this algorithm works, take an analogous situation of a simple guessing game. Suppose I pick a number from 1 to 99 and then tell you to try to guess the number in the fewest number of guesses. For each guess that you make, I can tell you if you are too low, too high, or if your guess is correct. After a few tries at the game, you will probably realize that a good way to narrow in on the answer is by using a halving process. For example, if you take 50 as your first guess, an answer of either “too high” or “too low” narrows the possibilities down from 100 to 49. If the answer was “too high,” the number must be from 1 to 49, inclusive; if the answer was “too low,” the number must be from 51 to 99, inclusive.

You can now repeat the halving process with the remaining 49 numbers. So if the first answer was “too low,” the next guess should be halfway between 51 and 99, which is 75. This process can be continued until you finally narrow in on the answer. On the average, this procedure takes far less time to arrive at the answer than any other search method.

The preceding discussion describes precisely how the binary search algorithm works. The following provides a formal description of the algorithm. In this algorithm, you are looking for an element x inside an array M, which contains n elements. The algorithm assumes that the array M is sorted in ascending order.

Binary Search Algorithm

Step 1: Set low to 0, high to n – 1.

Step 2: If low > high, x does not exist in M and the algorithm terminates.

Step 3: Set mid to (low + high) / 2.

Step 4: If M[mid] < x, set low to mid + 1 and go to step 2.

Step 5: If M[mid] > x, set high to mid – 1 and go to step 2.

Step 6: M[mid] equals x and the algorithm terminates.

The division performed in step 3 is an integer division, so if low is 0 and high is 49, the value of mid is 24.

Now that you have the algorithm for performing a binary search, you can rewrite your lookup function to use this new search strategy. Because the binary search must be able to determine if one value is less than, greater than, or equal to another value, you might want to replace your equalStrings function with another function that makes this type of determination for two character strings. Call the function compareStrings and have it return the value –1 if the first string is lexicographically less than the second string, 0 if the two strings are equal, and 1 if the first string is lexicographically greater than the second string. So, the function call

compareStrings (“alpha”, “altered”)

returns the value –1 because the first string is lexicographically less than the second string (think of this to mean that the first string occurs before the second string in a dic- tionary). And, the function call

compareStrings (“zioty”, “yucca”);

returns the value 1 because “zioty” is lexicographically greater than “yucca.”

In Program 10.10, the new compareStrings function is presented. The lookup func- tion now uses the binary search method to scan through the dictionary. The main rou- tine remains unchanged from the previous program.

Program 10.10 Modifying the Dictionary Lookup Using Binary Search

// Dictionary lookup program

#include <stdio.h>

struct entry

{

char word[15];

char definition[50];

};

// Function to compare two character strings

int compareStrings (const char s1[], const char s2[])

{

int i = 0, answer;

while ( s1[i] == s2[i] && s1[i] != ‘\0’&& s2[i] != ‘\0’ )

++i;

if ( s1[i] < s2[i] )

answer = -1; /* s1 < s2 */

else if ( s1[i] == s2[i] )

answer = 0; /* s1 == s2 */

else

answer = 1; /* s1 > s2 */

return answer;

}

// Function to look up a word inside a dictionary

int lookup (const struct entry dictionary[], const char search[], const int entries)

{

int low = 0;

int high = entries – 1;

int mid, result;

int compareStrings (const char s1[], const char s2[]);

while ( low <= high )

{

mid = (low + high) / 2;

result = compareStrings (dictionary[mid].word, search);

if ( result == -1 )

low = mid + 1;

else if ( result == 1 )

high = mid – 1;

else

return mid; /* found it */

}

return -1; /* not found */

}

int main (void)

{

const struct entry dictionary[100] =

{ { “aardvark”, “a burrowing African mammal” },

{ “abyss”, “a bottomless pit” },

{ “acumen”, “mentally sharp; keen” },

{ “addle”, “to become confused” },

{ “aerie”, “a high nest” },

{ “affix”, “to append; attach” },

{ “agar”, “a jelly made from seaweed” },

{ “ahoy”, “a nautical call of greeting” },

{ “aigrette”, “an ornamental cluster of feathers” },

{ “ajar”, “partially opened” } };

int entries = 10;

char word[15];

int entry;

int lookup (const struct entry dictionary[], const char search[], const int entries);

printf (“Enter word: “);

scanf (“%14s”, word);

entry = lookup (dictionary, word, entries);

if ( entry != -1 )

printf (“%s\n”, dictionary[entry].definition);

else

printf (“Sorry, the word %s is not in my dictionary.\n”, word);

return 0;

}

Program 10.10 Output

Enter word: aigrette

an ornamental cluster of feathers

Program 10.10 Output (Rerun)

Enter word: acerb

Sorry, that word is not in my dictionary.

The compareStrings function is identical to the equalStrings function up through the end of the while loop. When the while loop is exited, the function analyzes the two characters that resulted in the termination of the while loop. If s1[i] is less than s2[i], s1 must be lexicographically less than s2. In such a case, –1 is returned. If s1[i] is equal to s2[i], the two strings are equal so 0 is returned. If neither is true, s1 must be lexico- graphically greater than s2, in which case 1 is returned.

The lookup function defines int variables low and high and assigns them initial val- ues defined by the binary search algorithm. The while loop executes as long as low does not exceed high. Inside the loop, the value mid is calculated by adding low and high and dividing the result by 2. The compareStrings function is then called with the word contained in dictionary[mid] and the word you are searching for as arguments. The returned value is assigned to the variable result.

If compareStrings returns a value of –1—indicating that dictionary[mid].word is less than search—lookup sets the value of low to mid + 1. If compareStrings returns 1—indicating that dictionary[mid].search is greater than search—lookup sets the value of high to mid – 1. If neither –1 nor 1 is returned, the two strings must be equal, and, in that case, lookup returns the value of mid, which is the entry number of the word in the dictionary.

If low eventually exceeds high, the word is not in the dictionary. In that case, lookup returns –1 to indicate this “not found” condition.

Source: Kochan Stephen G. (2004), Programming in C: A Complete Introduction to the C Programming Language, Sams; Subsequent edition.

1. A Better Search Method

Leave a Reply Cancel reply

Login