Sequences in Python: Strings

A string is a sequence of characters. The word sequence implies that the or­der of the characters within the string matters, and that is certainly true. Strings most often represent the way that communication between a computer and a hu­man takes place. Human language consists of words and phrases, and each word or phrase is a string within a program. The order of the characters within a word matters a great deal to a human because some sequences are words and others are not. The string “last” is a word, but “astl” is not. The strings “salt” and “slat” are words and use exactly the same characters as “last,” but these characters are arranged in a different order.

Because order matters, the representation of a string on a computer imposes an order on the characters within, and so there is a first character, a second, and so on, and it should be possible to access each character individually. A string also has a length, which is the number of characters within it. A computer lan­guage provides specific actions that can be done to a string: these are called op­erations, and a type is defined at least partly by what operations can be done to something of that type. Because a string represents text in the human sense, the operations on strings should represent the kinds of things that would be done to text. This includes printing and reading, accessing any character, linking strings into longer strings, and searching a string for a particular word.

The examples of code written so far use only string constants. These are simply characters enclosed in either single or double quotes. Assigning a string constant to a variable causes that variable to have the string type and gives it a value. The statements

name = “John Doe”

address = ‘121 Second Street’

cause the variables named name and address to be strings with the assigned value. Note that either type of quote can be used, but a string that begins with a double quote must end with one.

A string behaves as if its characters are stored as consecutive characters in memory. The first character in a string is at location or index 0, and can be ac­cessed using square brackets after the string name. Using the definitions above, name[0] is “J” and name[5] = “D.” If an index is specified that is too large, it results in an error because it amounts to an attempt to look past the end of the string.

How many characters are there in the string name? The built-in function len() returns the length of the string. The largest legal index is one less than this value: the first character of a string name has index 0, and the final one has index 7; the length is 8. Thus, any index between 0 and len(name)-1 is legal. The fol­lowing code prints all of the characters of name and can be thought of as the basic pattern for code that scans through the characters in strings:

for i in range(0, len(name)):

print (name[i], end=””)

This may be a little confusing, but remember that the range(0,n) does not include n. This loop runs through values of i from 0 to len(name)-1.

Some languages have a character type, but Python does not. A string of length one is what Python uses instead. A component of a string is therefore an­other string. The first character of the string name, which is name[0], is “J,” the string containing only one character.

1. Comparing Strings

Two strings can be compared in the same manner as are two integers or real numbers, by using one of the relational operators ==, !=, <, >, <=, or >=. What it means for two strings to be equal is simple and reasonable: if each corresponding character in two strings is the same, then the strings are equal. That is, for strings a and b, if a[0] == b[0], and a[1]==b[1], and so on to the final character n, and a[n] == b[n], then the two strings a and b are equal and a==b. Otherwise a!=b. By the way, this implies that equal strings have the same length.

What about inequalities? Strings in real life are often sorted in alphabetical order. Names in a telephone book, files in a doctor’s office, and books in a store tend to appear in a logical order based on the alphabet. This is also true in Python. The string “abc” is less than the string “def,” for example. Why? Because the first letter in “abc” comes before the first letter in “def;” in other words, “abc”[0] < “def”[0]. Yes, characters in string constants can be accessed using their index.

A string si is less than string s2 and all characters from 0 through k in the two strings are equal, so si[k+1]<s2[k+1]. Therefore, the following statements are true:

“abcd” < “abce”

“123” < “345”

“ab ” < “abc”

In the last example, the space character ” ” is smaller than (i.e., comes be­fore) the letter “c” What if the strings are not the same length? The string “ab” < “abc”, so if two strings are equal to the end of one of them, then the shorter one is considered to be smaller. These rules are consistent so far with those taught in grade school for alphabetization. Trailing spaces do not matter. Leading spaces can matter, because a space comes before any alphabetic character; that is, ” ” < “a.” Thus “ab” > ” z.”

Digits come before lowercase letters. “1” < “a,” and “1a” < “a1.” Most importantly, uppercase letters come before lowercase letters, so “John” < “john.” All of these rules are consistent with those that secretaries understand when filing paper documents. As an example that compares strings, consider the following:

a = “J”

b = “j”

c = “1”

if b<c:

print (“Lcase < numbers”)

else:

print(“Lcase > numbers”)

if a<c:

print (“Ucase < numbers”)

else:

print(“Ucase > numbers”)

This results in the following output:

Lease > numbers

Ucase > numbers

Problem: Does a city name, entered at the console, come before or after the name Denver?

This involves reading a string and comparing it against the constant string “Denver.” Let the input string be read into a variable named city. Then the answer is as follows:

city = input() if city < “Denver”:

print (“The name given comes before Denver in an alphabetic list”)

elif city > “Denver”:

print (“The name given comes after Denver in an alphabetic list”)

else:

print (“The name given was Denver”)

If “Chicago” is typed at the console as input, the result is as follows:

Chicago

The name given comes before Denver in an alphabetic list

However, if case is ignored and “chicago” is typed instead, then the result is as follows:

chicago

The name given comes after Denver in an alphabetic list

because, of course, the lower case “c” comes (as do all lowercase letters) after the uppercase “D” at the beginning of “Denver.”

2.  Slicing – Extracting Parts of Strings

To a person, a string usually contains words and phrases, which are smaller parts of a string. Identifying individual words is important. To Python, this is true also. A Python program consists of statements that contain individual words and character sequences that each have a particular meaning. The words “if,” “while,” and “for” are good examples. Individual characters can be referenced through indexing, but can words or collections of characters be accessed? Yes, they can be accessed if the location (index) or the word is known.

Problem: Identify a “print” statement in a string.

The statement

print (“Lease < numbers”)

appears in the example program above. This can be thought of as a string and assigned to a variable:

statement = ‘print (“Lease < numbers”)’

Is this a print statement? It is if the first five characters are the word “print.” Each of those characters could be tested individually using the following code:

if statement[0] == ‘p’:

if statement[1] == ‘r’:

if statement[2] == ‘i’:

if statement[3]==’n’:

if statement[4]==’t’: if statement[5]==’ ‘:

# This is a print statement.

This is not an attractive format, and since this is something that is needed often enough, Python offers a better way to write it. A slice is a set of continuous characters within a string. This means their indices are consecutive, and they can be access as a sequence by specifying the range of indices within brackets. The situation above concerning the print statement could be written like this:

if statement[0:5] == “print”:

The slice here does not include character 5, but is 5 characters long, includ­ing the characters 0 through 4, inclusive. A slice from i to j (i.e., x[i:j]) does not include character j. This means that the following statements produce the same result:

fname[0]

fname[0:1]

If the first index is omitted, then the start index is assumed, so the statement

if statement[0:5] == “print”:

is the same as

if statement[:5] == “print”:

If the second index is omitted, then the last legal index is assumed, which is to say the index of the final character. The assignment

str = statement[6:]

results in the value of str being (“Lease < numbers”). Both indices can be omit­ted, which means we use everything from the first to the last character, or the entire string.

3. Editing Strings

Python does not allow the modification of individual parts of a string. That is, statements like

str[3] = “#”

str[2:3] = “..”

are not allowed. How can strings be modified? For example, consider the string variable

fname = “image”

If this is supposed to be the name of a JPG image file, then it must end with the suffix “.jpg.”

Problem: Create a JPEG file name from a basic string

The string fname can be edited to end with “jpg” in a few ways, but the easi­est one to use is the concatenation operator, +.

To concatenate means “to link or join together.” If the variables a and b are strings, then a+b is the string consisting of all characters in a followed by all characters in b; the operator + in this context means to concatenate, rather than to add numerically. The designers of Python and many other languages that imple­ment this operator think of concatenation as string addition.

To use this to create the image file name, simply concatenate “.jpg” to the string fname:

fname = fname + “.jpg”

The result is that fname contains “imagejpg.”

File suffixes are very often the subject of string manipulations and provide a good example of string editing. For instance, given a file name stored as a string variable fname, is the suffix .jpg? Based on the preceding discussion, the ques­tion can be answered using a simple if statement:

if fname[len(fname)-4:len(fname)] == ‘.jpg’:

Using a slice, it could also take the form

if fname[len(fname)-4:] == “.jpg”

A valuable thing to know is that negative indices index from the right side of the string, that is, from the end. Therefore, fname[-1] is the final character in the string, fname[-2] is the one previous to that, and so on. The last 4 characters, the suffix, are captured by using filename[-4:].

Problem: Change the suffix of a file name

Some individuals use the suffix .jpeg instead of .jpg. Some programs allow this, others do not. Some code that would detect and change this suffix is as fol­lows:

Problem: Reverse the order of characters in a string

There are things about any programming language that could be considered idioms. These are things that a programmer experienced in the use of that lan­guage would consider normal use, but that others might consider odd. This prob­lem exposes a Python idiom. Given what is known so far about Python, the logi­cal approach to string reversal might be as follows:

# city has a legal value at this point

k = len(city)

for i in range(0,len(city)):

city = city + city[k-i-1]

city = city[len(city)//2:]

This reverses the string named city that exists prior to the loop and creates the reversed string. It does so in the following way:

  1. Let i be an index into the string city, starting at 0 and running to the final character.
  2. Index a character from the end of the string, starting at the final character and stepping backwards to 0. Since the last character is len(city) and the current index is i, the character to be used in the current iteration would be k-i-1, where k is the length of the original string.
  3. Append city[k-i-1] to the end of the string. Alternatively, a new string rs could be created and this character appended to it during each iteration.
  4. After all the characters have been examined, the string city contains the original string at the beginning and the reversed string at the end. The first characters can be removed, leaving the reversed string only.

An experienced Python programmer would do this differently. The syntax for taking a slice has a variation that has not been discussed; a third parameter exists. A string slice can be expressed as

myString[a:b:c]

where a is the starting index, b is the final index+1, and c is the increment. If

str = “This string has 30 characters.”

then str[0:30:2] is “Ti tighs3 hrces,” which is every second character. The incre­ment represents the way the string is sampled, that is, every increment’s charac­ters is copied into the result. Most relevant to the current example, the increment can be negative. The idiom for reversing a string is as follows:

print (str[::-1])

As has been explained, the value of str[:] is the whole string. Specifying an increment of -1 implies that the string is scanned from 0 to the end, but in reverse order. This is far from intuitive, but is probably the way that an experienced Py­thon programmer would reverse a string. Any programmer should use the parts of any language that they comprehend very well, and should keep in mind the likely skill set of the people likely to read the code.

Problem: Is a given file name that of a Python program?

A Python program terminates with the suffix .py. An obvious solution to this problem is to simply look at the last 3 characters in the string s to see if they match that suffix:

if s[len(s)-3:len(s)] == ‘.py’:

print (“This is a Python program.”)

But is PROGRAM.PY a legal Python program? It happens that it is, and so is program.Py and program.pY. What can be done here?

4. String Methods

A good way to do the test in this case is to convert the suffix to all uppercase or all lowercase letters before doing the comparison. Comparing the name against .py means it should be converted to lowercase, which is done by using a built-in method named lower:

s1 = s[len(s)-3:len(s)]

if s1.lower()== ‘.py’:

print (“This is a Python program.”)

The variable si is a string that contains the final 3 characters of s. The ex­pression s1.lower() creates a copy of si in which all characters are lowercase. It’s called a method to distinguish it from a function, but they are very similar things. You should recall that a method is simply a function that belongs to one type or class of objects. In this case, lower() belongs to the type (or class) string. There could be another method named lower() that belongs to another class and that did a completely different action. The dot notation indicates that it is a method, and what class it belongs to: the same class of things that the variable belongs to. In addition, the variable itself is really the first parameter; if lower were a function, then it might be called by lower(s1) instead of s1.lower(). In the latter case, the “” is preceded by the first parameter.

Strings all have many methods. In Table 3.1, the variable s is the target string, the one being operated upon. This means that the method names below appear following s., as in s.lower(). Let the value of s be given by s = “hello to you all.” These methods are intended to provide the operations needed to make the string type in Python function as a major communication device from humans to a program.

5. Spanning Multiple Lines

Text as seen in human documents may contain many characters, even mul­tiple lines and paragraphs. A special delimiter, the triple quote, is used when a string constant is to span many lines. This has been mentioned previously in the context of multi-line comments. The regular string delimiters will terminate the string at the end of the line. The triple quote consists of either of the two existing

delimiters repeated three times. For example, to assign the first stanza of Byron’s poem “She Walks in Beauty” to the string variable poem, we would write the following code:

poem = ”’She walks in beauty like the night

Of cloudless climes and starry skies,

And all that’s best of dark and bright

Meets in her aspect and her eyes;

Thus mellow’d to that tender light

Which Heaven to gaudy day denies.”’

When poem is printed, the line endings appear where they were placed in the constant. This example is a particularly good one in that most poems require that lines end precisely where the poet intended.

Another example of a string that must be presented just as typed is a Python program. A program can be placed in a string variable using a triple quote:

program = …… list = [1,2,4,7,12,15,21]

for i in list:

print(i, i*2)..

When printed, this string has the correct form to be executed by Python. In fact, the following statement executes the code in the string:

exec (program)

6. For Loops Again

Earlier in this section, a for loop was written to print each character in the string. That loop was as follows:

for i in range(0, len(name)):

print (name[i], end=””)

Obviously, the string could have been printed using

print(name)

but it was being used as an example of indexing individual components within the string. The characters do not need to be indexed explicitly in Python; the loop variable can be assigned the value of each component:

for i in name:

print (i, end=””)

In this case, the value of i is the value of the component, not its index. Each component of the string is assigned to i in turn, and there is no need to test for the end of the string or to know its length. This is a better way to access components in a string and can be used with all sequence types. Whether an index is used or the components are pulled out one at a time depends on the problem being solved; sometimes the index is needed, other times it is not.

 

Source: Parker James R. (2021), Python: An Introduction to Programming, Mercury Learning and Information; Second edition.

Leave a Reply

Your email address will not be published. Required fields are marked *