Information about Heapsort
Heapsort is a comparison-based sorting algorithm, and is part of the selection sort family. Although somewhat slower in practice on most machines than a good implementation of quicksort, it has the advantage of a worst-case O(n log n) runtime. Heapsort is an in-place algorithm, but is not a stable sort.
Overview
Heapsort inserts the input list elements into a heap data structure. The largest value (in a max-heap) or the smallest value (in a min-heap) are extracted until none remain, the values having been extracted in sorted order. The heap's invariant is preserved after each extraction, so the only cost is that of extraction.During extraction, the only space required is that needed to store the heap. In order to achieve constant space overhead, the heap is stored in the part of the input array that has not yet been sorted. (The structure of this heap is described at Binary heap: Heap implementation.)
Heapsort uses two heap operations: insertion and root deletion. Each extraction places an element in the last empty location of the array. The remaining prefix of the array stores the unsorted elements.
Variations
- The most important variation to the simple variant is an improvement by R. W. Floyd which gives in practice about 25% speed improvement by using only one comparison in each siftup run which then needs to be followed by a siftdown for the original child; moreover it is more elegant to formulate. Heapsort's natural way of indexing works on indices from 1 up to the number of items. Therefore the start address of the data should be shifted such that this logic can be implemented avoiding unnecessary +/- 1 offsets in the coded algorithm.
- Ternary heapsort [1] uses a ternary heap instead of a binary heap; that is, each element in the heap has three children. It is more complicated to program, but does a constant number of times fewer swap and comparison operations. This is because each step in the shift operation of a ternary heap requires three comparisons and one swap, whereas in a binary heap two comparisons and one swap are required. The ternary heap does two steps in less time than the binary heap requires for three steps, which multiplies the index by a factor of 9 instead of the factor 8 of three binary steps. Ternary heapsort is about 12% faster than the simple variant of binary heapsort.
- The smoothsort sorting algorithm http://www.cs.utexas.edu/users/EWD/ewd07xx/EWD796a.PDF is a variation of heapsort developed by Edsger Dijkstra in 1981. Like heapsort, smoothsort's upper bound is O(n log n). The advantage of smoothsort is that it comes closer to O(n) time if the input is already sorted to some degree, whereas heapsort averages O(n log n) regardless of the initial sorted state. Due to its complexity, smoothsort is rarely used.
Comparison with other sorts
Heapsort primarily competes with quicksort, another very efficient general purpose nearly-in-place comparison-based sort algorithm.Quicksort is typically somewhat faster, due to better cache behavior and other factors, but the worst-case running time for quicksort is O(n2), which is unacceptable for large data sets and can be deliberately triggered given enough knowledge of the implementation, creating a security risk. See quicksort for a detailed discussion of this problem, and possible solutions.
Thus, because of the O(n log n) upper bound on heapsort's running time and constant upper bound on its auxiliary storage, embedded systems with real-time constraints or systems concerned with security often use heapsort.
Heapsort also competes with merge sort, which has the same time bounds, but requires Ω(n) auxiliary space, whereas heapsort requires only a constant amount. Heapsort also typically runs more quickly in practice on machines with small or slow data caches. On the other hand, merge sort has several advantages over heapsort:
- Like quicksort, merge sort on arrays has considerably better data cache performance, often outperforming heapsort on a modern desktop PC, because it accesses the elements in order.
- Merge sort is a stable sort.
- Merge sort parallelizes better; the most trivial way of parallelizing merge sort achieves close to linear speedup, while there is no obvious way to parallelize heapsort at all.
- Merge sort can be easily adapted to operate on linked lists and very large lists stored on slow-to-access media such as disk storage or network attached storage. Heapsort relies strongly on random access, and its poor locality of reference makes it very slow on media with long access times.
Pseudocode
The following is the "simple" way to implement the algorithm, in pseudocode, where swap is used to swap two elements of the array. Notice that the arrays are zero based in this example. function heapSort(a, count) is input: an unordered array a of length count(first place a in max-heap order) heapify(a, count)
end := count - 1 while end > 0 do (swap the root(maximum value) of the heap with the last element of the heap) swap(a[end], a[0]) (decrease the size of the heap by one so that the previous max value will stay in its proper placement) end := end - 1 (put the heap back in max-heap order) siftDown(a, 0, end)
function heapify(a,count) is (start is assigned the index in a of the last parent node) start := count ÷ 2 - 1
while start ≥ 0 do (sift down the node at index start to the proper place such that all nodes below the start index are in heap order) siftDown(a, start, count-1) start := start - 1 (after sifting down the root all nodes/elements are in heap order)
function siftDown(a, start, end) is input: end represents the limit of how far down the heap to sift. root := start
while root * 2 + 1 ≤ end do (While the root has at least one child) child := root * 2 + 1 (root*2+1 points to the left child) (If the child has a sibling and the child's value is less than its sibling's...) if child < end and a[child] < a[child + 1] then child := child + 1 (... then point to the right child instead) if a[root] < a[child] then (out of max-heap order) swap(a[root], a[child]) root := child (repeat to continue sifting down the child now) else return The heapify function can be thought of as building a heap from the bottom up, successively sifting downward to establish the heap property. An alternate version (shown below) that builds the heap top-down and sifts upward is conceptually simpler to grasp. This "siftUp" version can be visualized as starting with an empty heap and successively inserting elements. However, it is asymptotically slower: the "siftDown" version is O(n), and the "siftUp" version is O(n log n) in the worst case. The heapsort algorithm is O(n log n) overall using either version of heapify. function heapify(a,count) is (end is assigned the index of the first (left) child of the root) end := 1
while end < count (sift up the node at index end to the proper place such that all nodes above the end index are in heap order) siftUp(a, 0, end) end := end + 1 (after sifting up the last node all nodes are in heap order)
function siftUp(a, start, end) is input: start represents the limit of how far up the heap to sift. end is the node to sift up. child := end while child > start parent := ⌊(child - 1) ÷ 2? if a[parent] < a[child] then (out of max-heap order) swap(a[parent], a[child]) child := parent (repeat to continue sifting up the parent now) else return
C-code
Below is an implementation of the "standard" heapsort (also called bottom-up-heapsort). It is faster on average (see Knuth. Sec. 5.2.3, Ex. 18) and even better in worst-case behavior (1.5n log n) than the simple heapsort (2n log n). The sift_in routine is first a sift_up of the free position followed by a sift_down of the new item. The needed data-comparison is only in the macro data_i_LESS_THAN_ for easy adaption.This code is flawed - see
>
/* Heapsort based on ideas of J.W.Williams/R.W.Floyd/S.Carlsson */
#define data_i_LESS_THAN_(other) (data[i] < other)
#define MOVE_i_TO_free { data[free]=data[i]; free=i; }
void sift_in(unsigned count, SORTTYPE *data, unsigned free_in, SORTTYPE next)
{
unsigned i;
unsigned free = free_in;
// sift up the free node
for (i=2*free;i=free_in) && data_i_LESS_THAN_(next))
MOVE_i_TO_free
data[free] = next;
}
void heapsort(unsigned count, SORTTYPE *data)
{
unsigned j;
if (count <= 1) return;
data-=1; // map addresses to indices 1 til count
// build the heap structure
for(j=count / 2; j>=1; j--) {
SORTTYPE next = data[j];
sift_in(count, data, j, next);
}
// search next by next remaining extremal element
for(j= count - 1; j>=1; j--) {
SORTTYPE next = data[j + 1];
data[j + 1] = data[1]; // extract extremal element from the heap
sift_in(j, data, 1, next);
}
}
References
- J. W. J. Williams. Algorithm 232 - Heapsort, 1964, Communications of the ACM 7(6): 347–348.
- Robert W. Floyd. Algorithm 245 - Treesort 3, 1964, Communications of the ACM 7(12): 701.
- Svante Carlsson, Average-case results on heapsort, 1987, BIT 27(1): 2-17.
- Donald Knuth. The Art of Computer Programming, Volume 3: Sorting and Searching, Third Edition. Addison-Wesley, 1997. ISBN 0-201-89685-0. Pages 144–155 of section 5.2.3: Sorting by Selection.
- Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. Introduction to Algorithms, Second Edition. MIT Press and McGraw-Hill, 2001. ISBN 0-262-03293-7. Chapters 6 and 7 Respectively: Heapsort and Priority Queues
- A PDF of Dijkstra's original paper on Smoothsort
- Heaps and Heapsort Tutorial by David Carlson, St. Vincent College
Notes
1. ^ "Data Structures Using Pascal", Tenenbaum & Augenstein, 1991, page 405, gives Ternary Heapsort as an exercise for the student. "Write a sorting routine similar to the heapsort except that it uses a ternary heap."
See also
External links
- Analyze Heapsort in an online Javascript IDE
- Heapsort
- Heapsort animated
- NIST's Dictionary of Algorithms and Data Structures: Heapsort
- Sorting revisited
- Heapsort Animation
- Heapsort in C++ and explanation
Sorting algorithms | ||
|---|---|---|
| Theory | Computational complexity theory | ![]() |
| Exchange sorts | Bubble sort | |
| Selection sorts | Selection sort | |
| Insertion sorts | Insertion sort | |
| Merge sorts | Merge sort | |
| Non-comparison sorts | Radix sort | |
| Others | Topological sorting | |
A comparison sort is a type of sorting algorithm that only reads the list elements through a single abstract comparison operation (often a "less than or equal to" operator) that determines which of two elements should occur first in the final sorted list.
..... Click the link for more information.
..... Click the link for more information.
In computer science and mathematics, a sorting algorithm is an algorithm that puts elements of a list in a certain order. The most-used orders are numerical order and lexicographical order.
..... Click the link for more information.
..... Click the link for more information.
Selection sort is a sorting algorithm, specifically an in-place comparison sort. It has Θ(n2) complexity, making it inefficient on large lists, and generally performs worse than the similar insertion sort.
..... Click the link for more information.
..... Click the link for more information.
Quicksort is a well-known sorting algorithm developed by C. A. R. Hoare that, on average, makes (big O notation) comparisons to sort n items. However, in the worst case, it makes comparisons.
..... Click the link for more information.
..... Click the link for more information.
In computational complexity theory, big O notation is often used to describe how the size of the input data affects an algorithm's usage of computational resources (usually running time or memory).
..... Click the link for more information.
..... Click the link for more information.
in-place algorithm is an algorithm which transforms a data structure using a small, constant amount of extra storage space. The input is usually overwritten by the output as the algorithm executes.
..... Click the link for more information.
..... Click the link for more information.
In computer science and mathematics, a sorting algorithm is an algorithm that puts elements of a list in a certain order. The most-used orders are numerical order and lexicographical order.
..... Click the link for more information.
..... Click the link for more information.
Edsger Wybe Dijkstra
Born May 11 1930
Rotterdam, Netherlands
Died July 6 2002 (aged 72)
Nuenen, Netherlands
..... Click the link for more information.
Born May 11 1930
Rotterdam, Netherlands
Died July 6 2002 (aged 72)
Nuenen, Netherlands
..... Click the link for more information.
19th century - 20th century - 21st century
1950s 1960s 1970s - 1980s - 1990s 2000s 2010s
1978 1979 1980 - 1981 - 1982 1983 1984
Year 1981 (MCMLXXXI
..... Click the link for more information.
1950s 1960s 1970s - 1980s - 1990s 2000s 2010s
1978 1979 1980 - 1981 - 1982 1983 1984
Year 1981 (MCMLXXXI
..... Click the link for more information.
In computational complexity theory, big O notation is often used to describe how the size of the input data affects an algorithm's usage of computational resources (usually running time or memory).
..... Click the link for more information.
..... Click the link for more information.
Quicksort is a well-known sorting algorithm developed by C. A. R. Hoare that, on average, makes (big O notation) comparisons to sort n items. However, in the worst case, it makes comparisons.
..... Click the link for more information.
..... Click the link for more information.
In computational complexity theory, big O notation is often used to describe how the size of the input data affects an algorithm's usage of computational resources (usually running time or memory).
..... Click the link for more information.
..... Click the link for more information.
Quicksort is a well-known sorting algorithm developed by C. A. R. Hoare that, on average, makes (big O notation) comparisons to sort n items. However, in the worst case, it makes comparisons.
..... Click the link for more information.
..... Click the link for more information.
merge sort or mergesort is an O(n log n) comparison-based sorting algorithm. It is stable, meaning that it preserves the input order of equal elements in the sorted output. It is an example of the divide and conquer algorithmic paradigm.
..... Click the link for more information.
..... Click the link for more information.
CPU cache is a cache used by the central processing unit of a computer to reduce the average time to access memory. The cache is a smaller, faster memory which stores copies of the data from the most frequently used main memory locations.
..... Click the link for more information.
..... Click the link for more information.
In computer science, a parallel algorithm, as opposed to a traditional serial algorithm, is one which can be executed a piece at a time on many different processing devices, and then put back together again at the end to get the correct result.
..... Click the link for more information.
..... Click the link for more information.
In parallel computing, speedup refers to how much a parallel algorithm is faster than a corresponding sequential algorithm.
It is defined by the following formula:
where:
..... Click the link for more information.
It is defined by the following formula:
where:
- p is the number of processors
..... Click the link for more information.
linked list is one of the fundamental data structures, and can be used to implement other data structures. It consists of a sequence of nodes, each containing arbitrary data fields and one or two references ("links") pointing to the next and/or previous nodes.
..... Click the link for more information.
..... Click the link for more information.
Disk storage is a general category of a computer storage mechanisms, in which data is recorded on planar, round and rotating surfaces (disks, discs, or platters). A disk drive is a peripheral device used to collect information from.
..... Click the link for more information.
..... Click the link for more information.
Network-attached storage (NAS) is a file-level computer data storage connected to a computer network providing data access to heterogeneous network clients.
..... Click the link for more information.
Description
NAS hardware is similar to the traditional file server equipped with direct attached storage...... Click the link for more information.
random access is the ability to access an arbitrary element of a sequence in equal time. The opposite is sequential access, where a remote element takes longer time to access.
..... Click the link for more information.
..... Click the link for more information.
In computer science, locality of reference, also called the principle of locality, is the term applied to situations where the same value or related storage locations are frequently accessed.
..... Click the link for more information.
..... Click the link for more information.
Introsort or introspective sort is a sorting algorithm designed by David Musser in 1997. It begins with quicksort and switches to heapsort when the recursion depth exceeds a predefined value.
..... Click the link for more information.
..... Click the link for more information.
Robert W Floyd
Born May 8 1936
New York
Died September 25 2001 (aged 65)
Nationality American
..... Click the link for more information.
Born May 8 1936
New York
Died September 25 2001 (aged 65)
Nationality American
..... Click the link for more information.
Donald Ervin Knuth
Photographed by Jacob Appelbaum, 25 October 2005
Born January 10 1938
..... Click the link for more information.
Photographed by Jacob Appelbaum, 25 October 2005
Born January 10 1938
..... Click the link for more information.
Thomas H. Cormen is the co-author of Introduction to Algorithms, along with Charles Leiserson, Ron Rivest, and Cliff Stein. He is a Full Professor of computer science at Dartmouth College.
..... Click the link for more information.
..... Click the link for more information.
Charles E. Leiserson is a computer scientist, specializing in the theory of parallel computing and distributed computing, and particularly practical applications thereof; as part of this effort, he developed the Cilk multithreaded language.
..... Click the link for more information.
..... Click the link for more information.
..... Click the link for more information.
Clifford Stein is a computer scientist, currently working as a professor at Columbia University in New York, NY. He earned his BSE from Princeton University in 1987, a MS from Massachusetts Institute of Technology in 1989, and a PhD from Massachusetts Institute of Technology in
..... Click the link for more information.
..... Click the link for more information.
Introduction to Algorithms is a book by Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. It is used as the textbook for algorithms courses at many universities.
..... Click the link for more information.
..... Click the link for more information.
This article is copied from an article on Wikipedia.org - the free encyclopedia created and edited by online user community. The text was not checked or edited by anyone on our staff. Although the vast majority of the wikipedia encyclopedia articles provide accurate and timely information please do not assume the accuracy of any particular article. This article is distributed under the terms of GNU Free Documentation License.
Herod_Archelaus

