Formatted Text, Formatted I/O in Python

There is a generally believed theory that if numbers line up in nice columns, then they must be correct. This is obviously not true, but appearances can mat­ter a great deal, and numbers that do not line up properly for easy reading look sloppy and give people the impression that they may not be as carefully prepared as they should have been. The Python print() function as used so far prints a col­lection of variables and constants with no real attention to a format. Each item is printed in the order specified with a space between the items. Sometimes that’s good enough.

The Python versions since 2.7 have incorporated a string format() method that allows a programmer to specify how values should be placed within a string. The idea is to create a string that contains the formatted output, and then print the string. A simple example is as follows:

s = “x={} y=U”

fs = s.format (121.2, 6)

The string fs now contains “x=121.2 y=6.” The braces within the format string s hold the place for a value. The format() method lists values to be placed into the string, and with no other information given, it does so in order of appear­ance (in this case, 121.2 followed by 6). The first pair of braces is replaced by the first value, 121.2, and the second pair of braces is replaced by the second value, which is 6. Now the string fs can be printed.

This is not how it is usually done, though. Because this is usually part of the output process, it is often placed within the print() call:

print (“x={} y={}”.format(121.2, 6))

where the format() method is referenced from the string constant. No actual formatting is done by this particular call, merely a conversion to string and a substitution of values. The way formatting is done depends on the type of the value being formatted, the most common types being strings, integers, and floats.

1. Example: NASA Meteorite Landing Data

NASA publishes a large amount of data on its websites, and one of these is a collection of meteorite landings. It covers many years and has over 4,800 entries. The task assigned here is to print a nicely formatted report on selected parts of the data. The data on the file has its fields separated by commas, and there are ten of them: name, id, nametype, recclass, mass, Fall, year, reclat, reclong, and GeoLo­cation. The report requires that the name, recclass, mass, reclat, and reclong be arranged in a nicely formatted set of columns.

Reading the data involves opening the file, which is named “met.txt,” calling readline(), and then creating a list of the fields using split(“,”). If this is done and the fields are printed using print(), the result is messy. An abbreviated example is as follows (simulated data here):

infile = open (“met.txt”, “r”)

inline = infile.readline()

while inline !=””:

inlist = inline.split(“,”)

mass = float(inlist[4])

lat = float(inlist[7])

long = float(inlist[8])

print (inlist[0], inlist[3], inlist[4], inlist[7], inlist[8])

inline = infile.readline() infile.close()

The result is, as predicted, messy:

Ashdon H5 121.13519985254874 89.85924301385958 -126.27404435776049

Arbol Solo H6 66.94777134343516 25.567048824444797 160.58088365396014

Baldwyn L6 47.6388587105465 -7.708508536783924 -81.22266156597777

Ankober L6 15.265523451122064 -32.01862330869428 102.31244557598723

Ankober LL6 57.584802700693885 -84.85880091616322 106.31130649523368

Ash Creek L6 62.130089525516155 76.02832670618457 -140.03422105516938

Almahata Sitta LL5 30.476879105555653 -12.906745404586 47.411816322674

Nothing lines up in columns, and the numbers show an impossible degree of precision. There are also no headings.

The first field printed is called name, and is a string; it is the name of the lo­cation where the observation was made. The print statement simply adds a space after printing it, and so the next thing is printed immediately following it. Things do not line up. Formatting a string for output involves specifying how much space to allow and whether the string should be centered or aligned to the left or right side of the area where it will be printed. Applying a left alignment to the string variable named placename in a field of 16 characters would be done as follows:

‘{:16s}’.format(placename)

The braces, which have previously been empty, contain formatting direc­tives. Empty braces mean no formatting, and simply hold the place for a value. A full format could contain a name, a conversion part, and a specification:

{ [name] [‘!’ conversion] [‘:’ specification] }

where optional parts are in square brackets. Thus, the minimal format specifica­tion is ‘{ }.’ In the example “{: 16s},” there is no name and no conversion parts, only a specification. After the “:” is “16s,” meaning that the data to be placed here is a string, and that 16 characters should be allowed for it. It will be left aligned by default, so if placename was “Atlanta,” the result of the formatting would be the string “Atlanta ,” left aligned in a 16-character string. Unfortunately, if the original string is longer than 16 characters, it will not be truncated, and all of the characters will be placed in the resulting string, even if it makes it too long.

To right align a string, place a “>” character immediately following the “:”. So

{:>16s}”.format(“Atlanta”)

results in “ Atlanta.” Placing a “<” character there does a left alignment (the default) and “A” means to center it in the available space. The alignment specifi­cations apply to numbers as well as strings.

The first two values to be printed in the example are the city name, which is in inlist[0], and the meteorite class, which is inlist[3]. Formatting these is done as follows:

s = ‘{:16s} {:10s}’.format(inlist[0], inlist[3])

Both strings are left aligned.

Numeric formats are more complicated. For integers, there is the total space to allow, and also how to align it and what to do with the sign and leading zeros. The formatting letter for an integer is “d,” so Table 8.2 shows the legal directives and their meanings.

Floating point numbers have the extra issue of the decimal place. The format character is often “f,” but it can be “e” for exponential format or “g” for general format, meaning the system decides whether to use “f” or “e.” Otherwise, the formatting of a floating point number is like that of previous versions of Python and like that of C and C++.

The next three values that are printed are floating points: the mass of the me­teorite and the location, as the latitude and longitude. We print each of these as 7 places, 2 to the right of the decimal (‘{:7.2f}’).

The solution to the problem is as follows. The data is read line by line, con­verted into a list, and then the fields are formatted and printed in two steps:

infile = open (“met.txt”, “r”) inline = infile.readline()

print (” Place         Class Mass Latitude Longitude”)

while inline !=””:

inlist = inline.split(“,”)

mass = float(inlist[4])

lat = float(inlist[7])

long = float(inlist[8])

print(‘{:16s}  {:14s} {:7.2f}’.format(inlist[0],

inlist[3],mass),end=””)

print (‘   {:7.2f}     {:7.2f}’.format(lat, long))

inline = infile.readline()

infile.close()

The results are as follows:

There are many more formatting directives and a large number of combinations.

 

Source: Parker James R. (2021), Python: An Introduction to Programming, Mercury Learning and Information; Second edition.

Leave a Reply

Your email address will not be published. Required fields are marked *