Most people know what sorting is and can sort a small sequence of numbers in a few seconds. Each may have a distinct strategy for doing it, but few can explain to someone else how to sort an arbitrary set of numbers. They themselves may not know how they do it; they can simply tell when something is sorted, and have some process for sorting in mind. In short, the process of sorting is one of the simplest things that is hard to describe.
Because sorting is so important in computer science, it has been studied at great length. Sorting involves placing things in an order defined by a function that ranks them somehow. For numbers, ranking means using the numerical value. The sequence 1, 3, 2 is not in proper order, but 1, 2, 3 is in ascending (getting larger) order and 3, 2, 1 is in descending (getting smaller) order. Formally, a sequence s is in ascending order if si <= si-1 for all i. The act of sorting means arranging the values in a sequence so that this is true. It is clear that it can be decided when a sequence is sorted.
How can a sequence be placed in sorted order? By using a sorting algorithm, of course. For all of the following discussion on sorting, assume that the problem is to sort into ascending order.
1. Selection Sort
Small sequences are easier to sort than longer ones, and may provide some insight into the process. The sequence
is not sorted in ascending order, but testing this is easy and fixing it is trivial: simply swap the two values. The longer sequence
8 4 9
is also not sorted but is more difficult to sort because it is longer and there are more combinations of the numbers that are unsorted. How can this sequence be placed in order? Here’s one idea:
- Find the smallest element in the list.
- Swap that element for the element at the beginning of the list.
- Find the smallest element in the rest of the list.
- Swap that element for the second element in the list.
… and so on until the list is sorted.
This is called the selection sort algorithm, because at each stage, it selects the smallest of the unsorted items in the list and places it where it belongs. Consider the following list:
[12, 18, 5, 21, 9]
0 1 2 3 4 – index
The smallest element in this list is 5, at index 2. Swap element 2 for element 0:
[5, 18, 12, 21, 9]
The bold elements above are in sorted order, which here is only the one at location 0. For the remainder of the elements, repeat the process of finding the smallest element and placing it at the beginning of the unsorted list (element 1). That means swapping 9 for 18, element 4 for element 1:
[5, 9, 12, 21, 18]
Repeating, it turns out that element 2, value 12, is now the smallest, and it is in the correct place.
[5, 9, 12, 21, 18]
Now the value 18 is smallest and should be placed at location 3.
[ 5, 9, 12, 18, 21]
Now the sort is complete. When only one remains, it must be in the correct place.
Finding the smallest element in a list involves three things. First, begin with the initial element and assume that is it the smallest. Identify it using its index imin. Next, check the value of all successive elements in the list (from imin to the end of the list) against the value at imin. Finally, in the case where one of the successive values at index k is smaller than the one at index imin, set imin to k to indicate where a new smallest value was found. In simple, imprecise English, scan all of the elements above imin and remember the location of the smallest one. Presuming that the list to be sorted is named data, the code for finding the smallest element from imin to the end of the list is as follows:
for i in range (imin, len(data)):
if data[i] < data[imin]:
imin = i
This code does work, but it modifies imin, which is used to determine the loop bounds, within the loop itself. This can be confusing. It is better to code this loop as:
imin = istart
for i in range (iend, len(data)):
if data[i] < data[imin]:
imin = i
What happens after this is to swap the smallest value found for the one at location istart. In most programming languages, this would take three statements, which would look something like this:
temp = data[imin]
data[imin] = data[istart]
data[istart] = temp
In Python, this swap can be performed using a different syntax:
(data[istart], data[imin]) = (data[imin], data[istart])
This is the core of the algorithm, and needs to be done for all values of imin; that is, from 0 to len(data)-1. This is another for loop within which this code is placed. The outer loop would be coded as follows:
for istart in range (0, len(data)-l):
This is all that is needed for the sort. Writing it as a function, it looks like this:
def selection (data):
for istart in range (0, len(data)-l): imin = istart
for i in range(istart,len(data)):
if data[i] < data[imin]: imin = i
(data[istart], data[imin]) = (data[imin], data[istart])
This sorting method appears to be natural to humans. It is the one most often described by students when asked how they sort numbers. It is not the fastest in many cases, but does a small number of swaps. If the data is already sorted, it does no swaps; if it is in reverse order, it does len(data)-1 swaps, the smallest that can be done and still sort the list. When looking at algorithms, it is common to define a worst case and a best case, and to define performance not in seconds, but in terms of one of the operations performed. In that way the nature of the computer, whether it is fast or slow, does not affect the analysis. For sorting, it is common to select the operation to be used as a basis for comparison to be the compare operation: data[i]<data[imin]. How many of these are done?
The best case for the selection sort occurs when the list is already sorted. In that case, it will perform close to N2 comparisons, where N = len(data). This is the same number of comparisons needed for the worst case, in which the list is in reverse order. At least it is consistent. However, it minimizes the number of times swaps occur, and if swapping is expensive, then this could be the sorting method to choose.
The selection sort is unstable. If there are repeated values in the data, then they will be together in the final, sorted list. However, if a sort is stable, they will remain in the same order they were originally. The selection sort does not guarantee this. It seems as if this is a minor thing, but it does matter in some cases. Consider a list of names in a list that are given, in order of some sort of score, on a Web page. Names for tie scores should always be in the same order on the page, so that if the page is refreshed or a link is followed, the page looks the same.
It should be said here that generally there is no best sorting method. The properties of such a method are as follows:
- Selection sort is N2 in terms of comparisons. The best one can normally expect from any sort would be N*log(N) in the worst case.
- Does not need extra space. This means that the array can be sorted in place, with perhaps a temporary variable for performing swaps.
- Performs no more than N swaps in the worst case.
- Adaptive. The method detects when it is finished instead of looping through unproductive iterations. If, for example, such a method is given an already sorted list, it will finish in a single pass through the data.
No method has all of these characteristics.
2. Merge Sort
Let’s look at an algorithm that is different from the selection sort, and that has properties that it does not have. The method named merge sort fits that description nicely: it is an N*log(N) sort, it does need extra space, and it uses more than N swaps, but it is stable.
Merge sort is an example of a divide and conquer style of algorithm, in which a problem is repeatedly broken up into sub-problems, often using recursion, until they are small enough to solve; the solutions are combined to solve the larger problem. A merge sort breaks the data into parts that can be sorted trivially, then combines those parts knowing that they are sorted. Using the sample data from the selection sort example, the first step in the merge sort is to split the data into two parts. There are 5 elements in this list, and the middle element would be at 5//2, or 2, so the two parts are:
[12, 18] [5, 21, 9]
Splitting again, the first set has 2 elements, the middle being at 0; the second set has 3 elements, so split at 1:
   [21,1]
The final split breaks the data into individual components:
    
The splitting is done in such a way that the original locations are remembered. This happens in the recursive solution, but could be done in other ways. One way to visualize this is as a tree structure.
This completes the divide portion of the divide and conquer. Now that the individual elements are available, it is easy to sort them, as pairs. On the lower right the pair,  and  are out of order, so they must be swapped with each other. Now they are sorted. On the next level upwards, looking from left to right, the elements are sorted, although most are single elements:
Moving up again,  and  are combined to make [12,18], a sorted pair. On the right, the singleton  is merged with the pair [9,21] by looking at the beginning of each list and copying the smallest element of the pair into a new list:
The result is
At each stage, the lists contain more elements and they are sorted internally, the smallest element at the beginning. Combining a pair of these is simply a matter of looking at the element at the beginning of each and copying the smallest one to the result until the lists are empty. The next, and final, merge in this set of data is as follows:
Once the data has been split into individual components, the merge stage creates sorted pairs, the next merge creates sets of 4 sorted numbers, the next 8, and so on, doubling each time until they are all sorted. A logical way to write the program is to use recursion, where each recursive call splits the data in two more parts until there is only one element. The lowest level of recursion combines the individuals into sorted pairs, and returns to the next level where the pairs are combined into fours, then eights, and so on, until at the highest level the list is completely sorted. Written as a recursive function this is
The merge sort is not as obvious as was selection sort, but is faster in most cases. It has another interesting application: it can be used to sort files. If a file contains, for example, a billion data samples that need to be sorted it is unlikely that they can be read into memory and sorted with a selection sort. How then can we sort them?
Source: Parker James R. (2021), Python: An Introduction to Programming, Mercury Learning and Information; Second edition.