Information about Sorting Algorithm
In computer science and mathematics, a sorting algorithm is an algorithm that puts elements of a list in a certain order. The most-used orders are numerical order and lexicographical order. Efficient sorting is important to optimizing the use of other algorithms (such as search and merge algorithms) that require sorted lists to work correctly; it is also often useful for canonicalizing data and for producing human-readable output. More formally, the output must satisfy two conditions:
Since the dawn of computing, the sorting problem has attracted a great deal of research, perhaps due to the complexity of solving it efficiently despite its simple, familiar statement. For example, bubble sort was analyzed as early as 1956.[1] Although many consider it a solved problem, useful new sorting algorithms are still being invented to this day (for example, library sort was first published in 2004). Sorting algorithms are prevalent in introductory computer science classes, where the abundance of algorithms for the problem provides a gentle introduction to a variety of core algorithm concepts, such as big O notation, divide-and-conquer algorithms, data structures, randomized algorithms, best, worst and average case analysis, time-space tradeoffs, and lower bounds.
When equal elements are indistinguishable, such as with integers, or more generally, any data where the entire element is the key, stability is not an issue. However, assume that the following pairs of numbers are to be sorted by their first component:
(4, 1) (3, 7) (3, 1) (5, 6)
In this case, two different results are possible, one which maintains the relative order of records with equal keys, and one which does not:
(3, 7) (3, 1) (4, 1) (5, 6) (order maintained) (3, 1) (3, 7) (4, 1) (5, 6) (order changed)
Unstable sorting algorithms may change the relative order of records with equal keys, but stable sorting algorithms never do so. Unstable sorting algorithms can be specially implemented to be stable. One way of doing this is to artificially extend the key comparison, so that comparisons between two objects with otherwise equal keys are decided using the order of the entries in the original data order as a tie-breaker. Remembering this order, however, often involves an additional space cost.
Sorting based on a primary, secondary, tertiary, etc. sort key can be done by any sorting method, taking all sort keys into account in comparisons (in other words, using a single composite sort key). If a sorting method is stable, it is also possible to sort multiple times, each time with one sort key. In that case the keys need to be applied in order of increasing priority.
Example: sorting pairs of numbers as above by first, then second component:
(4, 1) (3, 7) (3, 1) (4, 6) (original)
(4, 1) (3, 1) (4, 6) (3, 7) (after sorting by second component) (3, 1) (3, 7) (4, 1) (4, 6) (after sorting by first component)
On the other hand:
(3, 7) (3, 1) (4, 1) (4, 6) (after sorting by first component) (3, 1) (4, 1) (4, 6) (3, 7) (after sorting by second component, order by first component is disrupted)
This table describes sorting algorithms that are not comparison sorts. As such, they are not limited by a O(nlog n) lower bound. Complexities below are in terms of n, the number of items to be sorted, k, the size of each key, and s, the chunk size used by the implementation. Many of them are based on the assumption that the key size is large enough that all entries have unique key values, and hence that n << 2k.
This table describes some sorting algorithms that are impractical for real-life use due to extremely poor performance or a requirement for specialized hardware.
For example, the popular recursive quicksort algorithm provides quite reasonable performance with adequate RAM, but due to the recursive way that it copies portions of the array it becomes much less practical when the array does not fit in RAM, because it may cause a number of slow copy or move operations to and from disk. In that scenario, another algorithm may be preferable even if it requires more total comparisons.
One way to work around this problem, which works well when complex records (such as in a relational database) are being sorted by a relatively small key field, is to create an index into the array and then sort the index, rather than the entire array. (A sorted version of the entire array can then be produced with one pass, reading from the index, but often even that is unnecessary, as having the sorted index is adequate.) Because the index is much smaller than the entire array, it may fit easily in memory where the entire array would not, effectively eliminating the disk-swapping problem. This procedure is sometimes called "tag sort".[2]
Another technique for overcoming the memory-size problem is to combine two algorithms in a way that takes advantages of the strength of each to improve overall performance. For instance, the array might be subdivided into chunks of a size that will fit easily in RAM (say, a few thousand elements), the chunks sorted using an efficient algorithm (such as quicksort or heapsort), and the results merged as per mergesort. This is more efficient than just doing mergesort in the first place, but it requires less physical RAM (to be practical) than a full quicksort on the whole array.
Techniques can also be combined. For sorting very large sets of data that vastly exceed system memory, even the index may need to be sorted using an algorithm or combination of algorithms designed to perform reasonably with virtual memory, i.e., to reduce the amount of swapping required.
.
Summarizing,
..... Click the link for more information.
- The output is in nondecreasing order (each element is no smaller than the previous element according to the desired total order);
- The output is a permutation, or reordering, of the input.
Since the dawn of computing, the sorting problem has attracted a great deal of research, perhaps due to the complexity of solving it efficiently despite its simple, familiar statement. For example, bubble sort was analyzed as early as 1956.[1] Although many consider it a solved problem, useful new sorting algorithms are still being invented to this day (for example, library sort was first published in 2004). Sorting algorithms are prevalent in introductory computer science classes, where the abundance of algorithms for the problem provides a gentle introduction to a variety of core algorithm concepts, such as big O notation, divide-and-conquer algorithms, data structures, randomized algorithms, best, worst and average case analysis, time-space tradeoffs, and lower bounds.
Classification
Sorting algorithms used in computer science are often classified by:- Computational complexity (worst, average and best behaviour) of element comparisons in terms of the size of the list (n). For typical sorting algorithms good behavior is O(n log n) and bad behavior is Ω(n²). (See Big O notation) Ideal behavior for a sort is O(n). Sort algorithms which only use an abstract key comparison operation always need Ω(n log n) comparisons in the worst case.
- Computational complexity of swaps (for "in place" algorithms).
- Memory usage (and use of other computer resources). In particular, some sorting algorithms are "in place", such that only O(1) or O(log n) memory is needed beyond the items being sorted, while others need to create auxiliary locations for data to be temporarily stored.
- Recursion. Some algorithms are either recursive or non recursive, while others may be both (e.g., merge sort).
- Stability: stable sorting algorithms maintain the relative order of records with equal keys (i.e. values). See below for more information.
- Whether or not they are a comparison sort. A comparison sort examines the data only by comparing two elements with a comparison operator.
- General method: insertion, exchange, selection, merging, etc. Exchange sorts include bubble sort and quicksort. Selection sorts include shaker sort and heapsort.
Stability
Stable sorting algorithms maintain the relative order of records with equal keys (i.e. sort key values). That is, a sorting algorithm is stable if whenever there are two records R and S with the same key and with R appearing before S in the original list, R will appear before S in the sorted list.When equal elements are indistinguishable, such as with integers, or more generally, any data where the entire element is the key, stability is not an issue. However, assume that the following pairs of numbers are to be sorted by their first component:
(4, 1) (3, 7) (3, 1) (5, 6)
In this case, two different results are possible, one which maintains the relative order of records with equal keys, and one which does not:
(3, 7) (3, 1) (4, 1) (5, 6) (order maintained) (3, 1) (3, 7) (4, 1) (5, 6) (order changed)
Unstable sorting algorithms may change the relative order of records with equal keys, but stable sorting algorithms never do so. Unstable sorting algorithms can be specially implemented to be stable. One way of doing this is to artificially extend the key comparison, so that comparisons between two objects with otherwise equal keys are decided using the order of the entries in the original data order as a tie-breaker. Remembering this order, however, often involves an additional space cost.
Sorting based on a primary, secondary, tertiary, etc. sort key can be done by any sorting method, taking all sort keys into account in comparisons (in other words, using a single composite sort key). If a sorting method is stable, it is also possible to sort multiple times, each time with one sort key. In that case the keys need to be applied in order of increasing priority.
Example: sorting pairs of numbers as above by first, then second component:
(4, 1) (3, 7) (3, 1) (4, 6) (original)
(4, 1) (3, 1) (4, 6) (3, 7) (after sorting by second component) (3, 1) (3, 7) (4, 1) (4, 6) (after sorting by first component)
On the other hand:
(3, 7) (3, 1) (4, 1) (4, 6) (after sorting by first component) (3, 1) (4, 1) (4, 6) (3, 7) (after sorting by second component, order by first component is disrupted)
List of sorting algorithms
In this table, n is the number of records to be sorted. The columns "Average" and "Worst" give the time complexity in each case, under the assumption that the length of each key is constant, and that therefore all comparisons, swaps, and other needed operations can proceed in constant time. "Memory" denotes the amount of auxiliary storage needed beyond that used by the list itself, under the same assumption. These are all comparison sorts.| Name | Average | Worst | Memory | Stable | Method | Other notes |
|---|---|---|---|---|---|---|
| Bubble sort | — | O(n²) | O(1) | Yes | Exchanging | Times are for best variant |
| Cocktail sort | — | O(n²) | O(1) | Yes | Exchanging | |
| Comb sort | O(n log n) | O(n log n) | O(1) | No | Exchanging | Small code size |
| Gnome sort | — | O(n²) | O(1) | Yes | Exchanging | |
| Selection sort | O(n²) | O(n²) | O(1) | No | Selection | |
| Insertion sort | O(n + d) | O(n²) | O(1) | Yes | Insertion | d is the number of inversions, which is O(n²) |
| Shell sort | — | O(n log² n) | O(1) | No | Insertion | |
| Binary tree sort | O(n log n) | O(n log n) | O(n) | Yes | Insertion | When using a self-balancing binary search tree |
| Library sort | O(n log n) | O(n²) | O(n) | Yes | Insertion | |
| Merge sort | O(n log n) | O(n log n) | O(n) | Yes | Merging | |
| In-place merge sort | O(n log n) | O(n log n) | O(1) | Yes | Merging | |
| Heapsort | O(n log n) | O(n log n) | O(1) | No | Selection | |
| Smoothsort | — | O(n log n) | O(1) | No | Selection | |
| Quicksort | O(n log n) | O(n²) | O(log n) | No | Partitioning | Naïve variants use O(n) space |
| Introsort | O(n log n) | O(n log n) | O(log n) | No | Hybrid | used in most implementations of STL |
| Patience sorting | — | O(n²) | O(n) | No | Insertion | Finds all the longest increasing subsequences within O(n log n) |
This table describes sorting algorithms that are not comparison sorts. As such, they are not limited by a O(nlog n) lower bound. Complexities below are in terms of n, the number of items to be sorted, k, the size of each key, and s, the chunk size used by the implementation. Many of them are based on the assumption that the key size is large enough that all entries have unique key values, and hence that n << 2k.
| Name | Average | Worst | Memory | Stable | n << 2k? | Notes |
|---|---|---|---|---|---|---|
| Pigeonhole sort | O(n+2k) | O(n+2k) | O(2k) | Yes | Yes | |
| Bucket sort | O(n·k) | O(n²·k) | O(n·k) | Yes | No | Assumes uniform distribution of elements from the domain in the array. |
| Counting sort | O(n+2k) | O(n+2k) | O(n+2k) | Yes | Yes | |
| LSD Radix sort | O(n·k/s) | O(n·k/s) | O(n) | Yes | No | |
| MSD Radix sort | O(n·k/s) | O(n·(k/s)·2s) | O((k/s)·2s) | No | No | |
| Spreadsort | O(n·k/log(n)) | O(n·(k - log(n)).5) | O(n) | No | No | Asymptotics are based on the assumption that n << 2k, but the algorithm does not require this. |
This table describes some sorting algorithms that are impractical for real-life use due to extremely poor performance or a requirement for specialized hardware.
| Name | Average | Worst | Memory | Stable | Comparison | Other notes |
|---|---|---|---|---|---|---|
| Bogosort | O(n × n!) | 8 | O(1) | No | Yes | Average time using Knuth shuffle |
| Bozo sort | O(n × n!) | 8 | O(1) | No | Yes | Average time is asymptotically half that of bogosort |
| Stooge sort | O(n2.71) | O(n2.71) | O(log n) | No | Yes | |
| Bead sort | N/A | N/A | — | N/A | No | Requires specialized hardware |
| Simple pancake sort | O(n) | O(n) | O(log n) | No | Yes | Count is number of flips. |
| Sorting networks | O(log n) | O(log n) | O(n•log n) | Yes | No | Requires a custom circuit of size O(n•log n) |
Summaries of popular sorting algorithms
Bubble sort
Selection sort
Insertion sort
Shell sort
Merge sort
Heapsort
Quicksort
Bucket sort
Radix sort
Memory usage patterns and index sorting
When the size of the array to be sorted approaches or exceeds the available primary memory, so that (much slower) disk or swap space must be employed, the memory usage pattern of a sorting algorithm becomes important, and an algorithm that might have been fairly efficient when the array fit easily in RAM may become impractical. In this scenario, the total number of comparisons becomes (relatively) less important, and the number of times sections of memory must be copied or swapped to and from the disk can dominate the performance characteristics of an algorithm. Thus, the number of passes and the localization of comparisons can be more important than the raw number of comparisons, since comparisons of nearby elements to one another happen at system bus speed (or, with caching, even at CPU speed), which, compared to disk speed, is virtually instantaneous.For example, the popular recursive quicksort algorithm provides quite reasonable performance with adequate RAM, but due to the recursive way that it copies portions of the array it becomes much less practical when the array does not fit in RAM, because it may cause a number of slow copy or move operations to and from disk. In that scenario, another algorithm may be preferable even if it requires more total comparisons.
One way to work around this problem, which works well when complex records (such as in a relational database) are being sorted by a relatively small key field, is to create an index into the array and then sort the index, rather than the entire array. (A sorted version of the entire array can then be produced with one pass, reading from the index, but often even that is unnecessary, as having the sorted index is adequate.) Because the index is much smaller than the entire array, it may fit easily in memory where the entire array would not, effectively eliminating the disk-swapping problem. This procedure is sometimes called "tag sort".[2]
Another technique for overcoming the memory-size problem is to combine two algorithms in a way that takes advantages of the strength of each to improve overall performance. For instance, the array might be subdivided into chunks of a size that will fit easily in RAM (say, a few thousand elements), the chunks sorted using an efficient algorithm (such as quicksort or heapsort), and the results merged as per mergesort. This is more efficient than just doing mergesort in the first place, but it requires less physical RAM (to be practical) than a full quicksort on the whole array.
Techniques can also be combined. For sorting very large sets of data that vastly exceed system memory, even the index may need to be sorted using an algorithm or combination of algorithms designed to perform reasonably with virtual memory, i.e., to reduce the amount of swapping required.
Graphical representations
- Hobart and William Smith Colleges has a graphical demonstration of various sorting algorithms. It includes explanations of each step in the sort, plus a timing test. Thus is a good tool for the novice computer science student.
- University of British Columbia has a graphical demonstration of various in-place sorts.
- University of Linköping has a similar demonstration in their course material.
- The sort algorithm visualizer provides graphical animation of 11 comparison-based sorts in a variety of initial configurations.
- A Java applet visually demonstrates several sorting algorithms.
- Boston College has a graphical demonstration of various sorting algorithms with various initial conditions.
Proof of the lower bound for the asymptotic running time of comparison sort algorithms
The worst case running time of any comparison sort algorithm is
- Let
be the decision tree for all possible comparisons between elements of a list
. So
is complete and the label of each vertex
of the tree are of the form
and
is source of two edges labeled
and
. The comparisons made by a sorting algorithm running over
corresponds to a path from the root of
to a leaf. One should note that for each possible permutation
of the elements in
there is such a path. So
has at least
leaves. We know that
is a binary tree, so it has a maximum of
leaves, where
is its height. We have
and
.
Summarizing,
See also
- Big O notation
- External sorting
- Sorting networks (compare)
- Collation
- Schwartzian transform
- Shuffling algorithms
- Search algorithms
- : Uses sorting a deck of cards with many sorting algorithms as an example
Notes and references
- D. E. Knuth, The Art of Computer Programming, Volume 3: Sorting and Searching.
External links
- http://www.iti.fh-flensburg.de/lang/algorithmen/sortieren/algoen.htm has explanations and analyses of many of these algorithms.
- Ricardo Baeza-Yates' sorting algorithms on the Web
- 'Dictionary of Algorithms, Data Structures, and Problems'
- Slightly Skeptical View on Sorting Algorithms Softpanorama page that discusses several classic algorithms and promotes alternatives to quicksort.
- Sorting Revisited
- Sorting small arrays C++ code and analysis looking at the best (fastest) way to sort small (ie < 64) items, including special attention on sorting networks.
- For a repository of algorithms with source code and lectures, see The Stony Brook Algorithm Repository
- Graphical Java illustrations of the Bubble sort, Insertion sort, Quicksort, and Selection sort
- xSortLab - An interactive Java demonstration of Bubble, Insertion, Quick, Select and Merge sorts, which displays the data as a bar graph with commentary on the workings of the algorithm printed below the graph.
- Sorting Algorithms Demo - Java applets that chart the progress of several common sorting algorithms while sorting an array of data using in-place algorithms.
- http://www.iti.fh-flensburg.de/lang/algorithmen/sortieren/sortcontest/sortcontest.htm - An applet visually demonstrating a contest between a number of different sorting algorithms
Italian:
- The Three Dimensional Bubble Sort- A method of sorting in three or more dimensions (of questionable merit)
- Sorting Algorithms Visualized - Java applet visualizing the different approaches of 22 sorting algorithms (configurable input set size and value distribution)
- Sort huge amounts of data by doing a multi-phase sorting on temporary file
- AniSort - Java applet visualizing 6 different sorting algorithms
- Naturalorder - Natural Order Numerical Sorting
- Pointers to sorting visualizations
- Monkey Sort discussion (another name for Bozo-sort)
- Extensive collection of animated sorting algorithms with source code.
- Several sorting algorithms demonstrated in Java
- OPL booklet of the main sorting algorithms by Michael Lamont
- QiSort - A new O(n log n) algorithm for 2007
Sorting algorithms | ||
|---|---|---|
| Theory | Computational complexity theory | ![]() |
| Exchange sorts | Bubble sort | |
| Selection sorts | Selection sort | |
| Insertion sorts | Insertion sort | |
| Merge sorts | Merge sort | |
| Non-comparison sorts | Radix sort | |
| Others | Topological sorting | |
Computer science, or computing science, is the study of the theoretical foundations of information and computation and their implementation and application in computer systems.
..... Click the link for more information.
..... Click the link for more information.
Mathematics (colloquially, maths or math) is the body of knowledge centered on such concepts as quantity, structure, space, and change, and also the academic discipline that studies them. Benjamin Peirce called it "the science that draws necessary conclusions".
..... Click the link for more information.
..... Click the link for more information.
In mathematics, computing, linguistics, and related disciplines, an algorithm is a finite list of well-defined instructions for accomplishing some task that, given an initial state, will proceed through a well-defined series of successive states, eventually terminating in an
..... Click the link for more information.
..... Click the link for more information.
list is a collection of entities/items.
In the context of object-oriented programming languages, a list is defined as an instance of an abstract data type (ADT), formalizing the concept of an ordered collection of entities.
..... Click the link for more information.
In the context of object-oriented programming languages, a list is defined as an instance of an abstract data type (ADT), formalizing the concept of an ordered collection of entities.
..... Click the link for more information.
In mathematics, a total order, linear order, simple order, or (non-strict) ordering on a set X is any binary relation on X that is antisymmetric, transitive, and total.
..... Click the link for more information.
..... Click the link for more information.
In mathematics, the lexicographic or lexicographical order, (also known as dictionary order, alphabetic order or lexicographic(al) product), is a natural order structure of the Cartesian product of two ordered sets.
..... Click the link for more information.
..... Click the link for more information.
Sorting is any process of arranging items in some sequence and/or in different sets, and accordingly, it has two common, yet distinct meanings:
..... Click the link for more information.
- ordering: arranging items of the same kind, class, nature, etc.
..... Click the link for more information.
In computer science, a search algorithm, broadly speaking, is an algorithm that takes a problem as input and returns a solution to the problem, usually after evaluating a number of possible solutions.
..... Click the link for more information.
..... Click the link for more information.
Merge algorithms are a family of algorithms that run sequentially over multiple sorted lists, typically producing more sorted lists as output. This is well-suited for machines with tape drives.
..... Click the link for more information.
..... Click the link for more information.
canonicalization (abbreviated c14n) is a process for converting data that has more than one possible representation into a "standard" canonical representation. This can be done to compare different representations for equivalence, to count the number of distinct data
..... Click the link for more information.
..... Click the link for more information.
In mathematics, a total order, linear order, simple order, or (non-strict) ordering on a set X is any binary relation on X that is antisymmetric, transitive, and total.
..... Click the link for more information.
..... Click the link for more information.
For other senses of this word, see permutation (disambiguation).
Permutation is the rearrangement of objects or symbols into distinguishable sequences. Each unique ordering is called a permutation...... Click the link for more information.
Bubble sort is a simple sorting algorithm. It works by repeatedly stepping through the list to be sorted, comparing two items at a time and swapping them if they are in the wrong order. The pass through the list is repeated until no swaps are needed, which means the list is sorted.
..... Click the link for more information.
..... Click the link for more information.
Library sort, or gapped insertion sort is a sorting algorithm that uses an insertion sort, but with gaps in the array to accelerate subsequent insertions. The name comes from an analogy:
..... Click the link for more information.
..... Click the link for more information.
In computational complexity theory, big O notation is often used to describe how the size of the input data affects an algorithm's usage of computational resources (usually running time or memory).
..... Click the link for more information.
..... Click the link for more information.
In computer science, divide and conquer (D&C) is an important algorithm design paradigm. It works by recursively breaking down a problem into two or more sub-problems of the same (or related) type, until these become simple enough to be solved directly.
..... Click the link for more information.
..... Click the link for more information.
data structure is a way of storing data in a computer so that it can be used efficiently. Often a carefully chosen data structure will allow the most efficient algorithm to be used. The choice of the data structure often begins from the choice of an abstract data type.
..... Click the link for more information.
..... Click the link for more information.
A randomized algorithm or probabilistic algorithm is an algorithm which employs a degree of randomness as part of its logic. In common practice, this means that the machine implementing the algorithm has access to a pseudorandom number generator.
..... Click the link for more information.
..... Click the link for more information.
In computer science, best, worst and average cases of a given algorithm express what the resource usage is at least, at most and on average, respectively.
..... Click the link for more information.
..... Click the link for more information.
In computer science, a space-time or time-memory tradeoff is a situation where the memory use can be reduced at the cost of slower program execution, or vice versa, the computation time can be reduced at the cost of increased memory use.
..... Click the link for more information.
..... Click the link for more information.
Computer science, or computing science, is the study of the theoretical foundations of information and computation and their implementation and application in computer systems.
..... Click the link for more information.
..... Click the link for more information.
As a branch of the theory of computation in computer science, computational complexity theory investigates the problems related to the amounts of resources required for the execution of algorithms (e.g.
..... Click the link for more information.
..... Click the link for more information.
In computer science, best, worst and average cases of a given algorithm express what the resource usage is at least, at most and on average, respectively.
..... Click the link for more information.
..... Click the link for more information.
In computer science, best, worst and average cases of a given algorithm express what the resource usage is at least, at most and on average, respectively.
..... Click the link for more information.
..... Click the link for more information.
In computer science, best, worst and average cases of a given algorithm express what the resource usage is at least, at most and on average, respectively.
..... Click the link for more information.
..... Click the link for more information.
In computational complexity theory, big O notation is often used to describe how the size of the input data affects an algorithm's usage of computational resources (usually running time or memory).
..... Click the link for more information.
..... Click the link for more information.
In computational complexity theory, big O notation is often used to describe how the size of the input data affects an algorithm's usage of computational resources (usually running time or memory).
..... Click the link for more information.
..... Click the link for more information.
As a branch of the theory of computation in computer science, computational complexity theory investigates the problems related to the amounts of resources required for the execution of algorithms (e.g.
..... Click the link for more information.
..... Click the link for more information.
in-place algorithm is an algorithm which transforms a data structure using a small, constant amount of extra storage space. The input is usually overwritten by the output as the algorithm executes.
..... Click the link for more information.
..... Click the link for more information.
A comparison sort is a type of sorting algorithm that only reads the list elements through a single abstract comparison operation (often a "less than or equal to" operator) that determines which of two elements should occur first in the final sorted list.
..... Click the link for more information.
..... Click the link for more information.
This article is copied from an article on Wikipedia.org - the free encyclopedia created and edited by online user community. The text was not checked or edited by anyone on our staff. Although the vast majority of the wikipedia encyclopedia articles provide accurate and timely information please do not assume the accuracy of any particular article. This article is distributed under the terms of GNU Free Documentation License.
Herod_Archelaus
