21 Mar, 2009 · 23 minutes read
Retrieving data quickly is one of the major tasks of an efficient data management system. Sorting helps in retrieving data faster by storing data in a particular order. There are various sorting algorithms that can be used to arrange data in a specific order.
This tutorial discusses various sorting algorithms and their implementation. In addition, it compares the efficiency of various sorting algorithms.
Consider that you need to retrieve the telephone number of a person named Steve in a telephone directory, where the names in the telephone directory are stored in a random order.
To retrieve the desired record, you need to sequentially traverse the list of names one by one because the names are not sorted, which is a tedious and time consuming activity. When you have to retrieve a record from a huge volume of data, this activity becomes even more difficult.
A simple solution to save time, and search data efficiently is sorting. Sorting is the process of arranging data in some pre-defined order or sequence. The order can be either ascending or descending.
If the data is sorted, you can directly go to the section that stores the names starting with ‘S’, thereby reducing the number of records to be traversed.
There are different types of sorting algorithms that can help you sort data in a particular order. Even when two algorithms have the same efficiency, there can be situations when one works better than the other.
There are various sorting algorithms that are used to sort data. Some of them are:-
Because there are a number of sorting algorithms, it becomes important to understand which sorting algorithm to use in a particular situation. To select an appropriate algorithm, you need to consider the following criteria in the suggested order:-
Consider a situation where the data that needs to be sorted is small in quantity. To sort this data, all the sorting algorithms will use a reasonable amount of storage space. In addition, the sorting algorithms will be executed in a reasonable amount of time.
Therefore, in this situation, the criteria for selecting the sorting algorithm would be the programming effort involved. An algorithm that requires less programming effort would be preferred over an algorithm that requires more programming effort .
Consider another situation where the data that needs to be sorted is large in size. In such a situation, the time taken by different algorithms may differ drastically because of the difference in their orders of growth. For example, when there are a large number of elements, an algorithm with a logarithmic order of growth will execute much faster than an algorithm with a quadratic order of growth.
In addition, with increase in data, the space requirement of different algorithms may also differ drastically. Therefore, when the data is large, you need to select a sorting algorithm that makes most efficient use of time or memory, depending upon the requirement.
Bubble sort is one of the simplest sorting algorithms. This algorithm has a quadratic order of growth and is therefore suitable for sorting small lists only. The algorithm works by repeatedly scanning through the list, comparing adjacent elements, and swapping them if they are in the wrong order. The algorithm gets its name from the way smaller elements “bubble’ to the top of the list after being swapped with the greater elements.
To understand the implementation of bubble sort algorithm, consider an unsorted list of numbers stored in an array. Suppose there are n elements in the array.
To implement bubble sort algorithm, you need to traverse the list multiple times. The process of traversing the entire list once is called a pass .It can be said that sorting is performed in multiple passes.
In Pass 1, you compare the first two elements and interchange their values, if the first number is greater than the second number. Then, you compare the second and the third elements, and interchange their values if they are not in the correct order. You repeat this process till (n – 1) thelement is compared with the n thelement. The total number of comparisons in Pass 1 is therefore, n – 1.
By the end of Pass 1, the largest element is placed at the n thposition in the array. The
number of comparisons is one less than the total number of elements in the list.
In Pass 2, you repeat the same process as in Pass 1, but stop the comparison after comparing the element at the (n – 2) thposition with the element at the (n – 1) thposition. This time the number of comparisons required will be one less than in Pass 1. The total number of comparisons in Pass 1 is therefore, n – 2.
By the end of Pass 2, the second largest number will be placed at the (n – 1) thposition in
the array.
In Pass 3, you repeat the same process as in Pass 2, and this time there will be n – 3 comparisons. This means that the comparison will stop after comparing the element at the (n
– 3) thposition with the element at the (n – 2) thposition.
Continuing the same process in all subsequent passes, in the (n – 1) thpass you will have to perform only one comparison. After the completion of this pass, the list will be sorted in the ascending order.
Note: To sort a list with n elements by using bubble sort, n– 1 passes are required
Consider an example. You have an unsorted list containing the ranks of students based on the results of an exam. You need to sort these ranks in the ascending order. The list of ranks is stored in an array, as shown in the following figure:-
There are five elements in the list, therefore four passes would be required to sort the list.
In Pass 1, you need to perform the following steps:-
At the end of Pass 1, the largest element is placed at the last index position. The preceding process will be repeated in all the subsequent passes. However, the value at index 4 will not be compared in any of the subsequent passes because it has already been placed at its correct position.
In Pass 2, you need to perform the following steps:-
At the end of this pass, the second largest element is placed at its correct position in the list. You will repeat the same process in the subsequent passes. However, the values at Index 3 and 4 will not be compared in the subsequent passes, because they are already placed at their correct position.
The result after Pass 3 would be as follows:-
Similarly, the result after Pass 4 would be as follows.
After Pass 4, the list is completely sorted.
The algorithm to sort the list by using the bubble sort algorithm is as follows:-
Note:If in the preceding example, instead of checking that the element at index j is greater than the element at index j + 1, you check that the element at index j is less than the element at index j+ 1, then the list will be sorted in the descending order.
The efficiency for a sorting algorithm is measured in terms of the number of comparisons. The number of comparisons in bubble sort can be easily computed. In bubble sort, there are n – 1 comparisons in Pass 1, n – 2 comparisons in Pass 2, and so on. Therefore, the total number of comparisons will be: (n – 1) + (n – 2) + (n – 3) + … + 3 + 2 + El,which is an arithmetic progression.
The formula for determining the sum of an arithmetic progression is:-
Sum = n/2 [2a + (n – 1)d]
Where,
n is the total number of elements in the arithmetic progression.
a is the first element in the arithmetic progression.
d is the step value, that is, the difference in successive terms of an arithmetic progression.
For the preceding arithmetic progression,
n = n – 1
a = 1
d = 1
Therefore,
Sum = (n -1)/2 [2 x 1 + (n – 1 – el) x el]
Sum = (n – 1)/2 [2 + (n – 2)]
Sum = (n -1)/2 (n)
Sum = n(n – 1)/2
n(n – 1)/2is of O(n 2) order. Therefore, the bubble sort algorithm is of the order O(n 2). This means that the time taken to execute the algorithm increases quadratically with the increase in the size of the array.
Suppose it takes 100 ns to execute the algorithm on a list of 10 elements. Now, if the number of elements is doubled, that is, the number of elements is increased to 20, the execution time will increase to 400 ns, which is four times the time taken to execute the algorithm on 10 elements.
Given below is a function that takes an unsorted integer array and sorts it using the Bubble Sort Algorithm.
public int[] BubbleSortArray(int[] a)
{
for (int i = 1; i < a.Length; i++) // For n - 1 passes
{
// In pass i, compare the first n - i elements with their next elements
for (int j = 0; j < a.Length - 1; j++)
{
if (a[j] > a[j + 1])
{
int temp;
temp = a[j];
a[j] = a[j + 1];
a[j + 1] = temp;
}
}
}
return a;
}
Like bubble sort, selection sort is another simple sorting algorithm. Selection sort also has a quadratic order of growth, and is therefore used for sorting small lists only.
Selection sort scans through the list iteratively, selecting one item in each scan, and moving the item to its correct position in the list. In contrast to bubble sort, where you need to perform n – r swaps in the r thpass, selection sort performs a maximum of one swap per pass.
Similar to bubble sort, selection sort also involves multiple passes through the list of elements. In Pass 1, you locate the smallest element from the list, and swap it with the element at the first position in the list. At the end of Pass 1, the smallest value will be placed at its correct position in the list.
In Pass 2, you locate the next smallest value, and swap it with the element placed at the second position in the list. Similarly, you repeat the steps until all the values are placed at the correct positions in the list. If there are n elements in the list, the list will be sorted in n – 1 passes.
Suppose, you need to sort the following list of elements stored in an array by using the
selection sort algorithm.
To sort this list by using the selection sort algorithm, you need to perform the following
steps:-
The smallest element needs to be swapped with the element at index 2. However, the smallest element is already at index 2. Therefore, you do not need to swap its position with another element.
The list is now completely sorted.
The algorithm to sort the list by using the selection sort algorithm is as follows:-
To determine the efficiency of selection sort, you need to determine the number of comparisons in selection sort. There are n – 1 comparisons during Pass 1 to find the smallest element. There are n – 2 comparisons during Pass 2 to find the second smallest
element, and so on.
Therefore, the total number of comparisons will be: –
(n – 1) + (n – 2) + (n – 3) + … + 3 + 2 + 1
This is the same as bubble sort. Therefore, the selection sort algorithm is of the order O(n 2) . This means that the time taken to execute the algorithm increases quadratically with the increase in the size of the array.
Given below is a C# implementation of the above algorithm:-
public int[] SelectionSortArray(int[] a)
{
for (int j = 0; j < a.Length - 1; j++)
{
int MinIndex = j;
for (int i = j + 1; i < a.Length; i++)
{
if (a[i] < a[MinIndex])
{
MinIndex = i;
}
}
int Temp = a[MinIndex];
a[MinIndex] = a[j];
a[j] = Temp;
}
return a;
}
Similar to bubble and selection sort that have a quadratic order of growth, insertion sort also has a quadratic order of growth, and is therefore used for sorting small lists only.
However, insertion sort is much more efficient than bubble sort and selection sort, if the list that needs to be sorted is nearly sorted. This is because bubble sort and selection sort always perform the same number of comparisons, no matter what is the initial ordering of elements.
In contrast, insertion sort performs different number of comparisons depending on the initial ordering of elements. When the elements are already in the sorted order, insertion sort needs to make very few comparisons.
The insertion sort algorithm divides the list into two parts, sorted and unsorted. Initially, the sorted part contains only one element. In each pass, one element from the unsorted list is inserted at its correct position in the sorted list. As a result, the sorted list grows by one element and the unsorted list shrinks by one element in each pass.
Consider an unsorted list stored in an array that needs to be sorted by using the insertion
sort algorithm.
To sort this list by using the insertion sort algorithm, you need to divide the list into two sublists, sorted and unsorted. Initially, the sorted list contains only the first element and the unsorted list contains the remaining four elements, as shown in the following figure:-
Note:The list is not physically divided into two separate arrays. The division is only logical where the element at index 0’s considered to be in the sorted list and the elements at indexes I to 4 are considered to be in the unsorted list.
Now, to further sort the unsorted list, you need to perform a number of passes:
The sorted list has two elements now and the unsorted list has three elements.
The sorted list has three elements now and the unsorted list has two elements.
Here, 10 is smaller than the three elements in the sorted list, therefore, it needs to be stored at index position 0, as shown in the following figure.
The sorted list has four elements now and the unsorted list has one element.
The unsorted list is now empty and the sorted list contains all the elements. This
means that the list is now completely sorted.
The algorithm to sort the list by using the insertion sort algorithm is as follows:-
Consider an unsorted list of numbers with n elements. To sort this unsorted fist by using insertion sorting, you need to perform (n - 1I) passes. In insertion sort, if the list is already sorted, you will have to make only one comparison in each pass.
In n -1 passes, you will need to make n - 1 comparisons. This is the best case for insertion sort. Therefore, the best case efficiency of insertion sort is of the order O(n).
Now, consider a situation where initially, the list is stored in the reverse order. In this case, you need to make one comparison in the first pass, two comparisons in the second pass, three comparisons in Pass 3, and n - 1 comparisons in the (n - 1) thpass.
The total number of comparisons in this case is:-
Sum = 1 + 2 + ... + (n - 1)
This is the same as bubble sort and selection sort. Therefore, the worst case efficiency of
insertion sort is of the order O(n 2).
Given below is a C# implementation of the Insertion sort algorithm:-
public int[] InsertionSortArray(int[] a)
{
for (int i = 1; i < a.Length; i++)
{
int Temp = a[i];
int j = i - 1;
while (j >= 0 && a[j] > Temp)
{
a[j + 1] = a[j];
j--;
}
a[j + 1] = Temp;
}
return a;
}
Insertion sort is an efficient algorithm only if the list is already partially sorted and results in an inefficient solution in an average case. To overcome this limitation, a computer scientist, DL. Shell proposed an improvement over the insertion sort algorithm. The new algorithm was called shell sort after the name of its proposer.
The shell sort algorithm improves the insertion sort algorithm by grouping the elements separated by a distance of several positions to form multiple sublists. Once the list is divided into sublists, you apply insertion sort on each sublist to move the elements towards their correct positions. This helps an element to take a bigger step towards its correct position, thereby reducing the number of comparisons.
To understand the implementation of shell sort algorithm, consider an array a [0...n - 1]. To apply shell sort on this array, you need to select the distance by which the elements in
a group will be separated.
Suppose, initially you group the elements separated by a distance of four elements to create the following sublists:-
• a[0], a[4], a[8] ...
• a[1], a[5], a[9] ...
• a[2], a[6], a[10] ...
• a[3], a[7], a[11] ...
You can say that the sublists are created with an increment value of four. Each of the preceding sublists will be sorted by using insertion sort. In the next pass, the increment value will be reduced to three, and the elements will again be grouped to create the following sublists.
Each of the preceding sublists will again be sorted by using insertion sort. In the next pass, the increment value will be reduced to two, and the elements will again be grouped to create the following sublists:-
Each of the preceding lists Will again be sorted by using insertion sort. In the next pass, the increment value will be reduced to one. In this case, there will be only one list that will be almost sorted. Insertion sort will be applied on the list to sort it completely.
Consider an example of an unsorted list of 11 elements, as shown in the following figure.
To sort the given list by using the shell sort algorithm, you need to perform a number of passes. The number of passes would depend on the initial increment value. Suppose the initial increment value is three, then the list will be sorted in three passes of the complete
list.
Pass 1
In Pass 1, you group the elements with a distance of three to form three sublists, as shown in the following figure:-
After the list is divided into sublists, you sort the sublists by using the insertion sort algorithm on each sublist individually. Once the sublists are sorted, you combine them to form one single list.
The following figure depicts the result of Pass 1:-
Pass 2
In Pass 2, you group the elements separated by a distance of two to form two sublists, as shown in the following figure:-
After the list is divided into sublists, you sort the sublists by using the insertion sort algorithm. Once the sublists are sorted, you combine them to form a single list.
The following figure displays the result of Pass 2.
Pass 3
In Pass 3, you group the elements separated by a distance of one, as shown in the
following figure:-
In this case, there is only a single list. The list is more or less sorted. You can sort the list completely by applying the insertion sort algorithm. The following figure displays the
result of Pass 3.
The list is now sorted. The algorithm to sort the list by using the shell sort algorithm is as
follows:-
The efficiency of shell sorting can be determined with the help of mathematical analysis.
Although it is easy to develop an intuitive sense of how this algorithm works, it is very difficult to analyze its execution time. However, estimates range from O(n log 2n) to O(n 1.5) depending on the initial size of the increment value.
Note: The efficiency of this sorting algorithm is based on the size of the increments. It is recommended to use prime numbers as increment values because if the increment values are powers of 2 then the same elements are compared again in the next passes.
Given below is a C# implementation of the Shell sort algorithm:-
public int[] ShellSortArray(int[] a)
{
for (int incr = 3; incr > 0; incr--)
{
for (int L = 0; L < incr; L++)
{
for (int i = L + incr; i < a.Length; i += incr)
{
int temp = a[i];
int j = i - incr;
while (j >= 0 && a[j] > temp)
{
a[j + incr] = a[j];
j -= incr;
}
a[j + incr] = temp;
}
}
}
return a;
}
You will learn about other sorting techniques, such as quick sort, merge sort, and heap
sort in subsequent tutorials