Information about Trie
In computer science, a trie, or prefix tree, is an ordered tree data structure that is used to store an associative array where the keys are usually strings. Unlike a binary search tree, no node in the tree stores the key associated with that node; instead, its position in the tree shows what key it is associated with. All the descendants of any one node have a common prefix of the string associated with that node, and the root is associated with the empty string. Values are normally not associated with every node, only with leaves and some inner nodes that happen to correspond to keys of interest.
The term trie comes from "retrieval." Due to this etymology it is pronounced [tri] ("tree"), although some encourage the use of [traɪ] ("try") in order to distinguish it from the more general tree.
In the example shown, keys are listed in the nodes and values below them. Each complete English word has an integer value associated with it. A trie can be seen as a deterministic finite automaton, although the symbol on each edge is often implicit in the order of the branches.
It is not necessary for keys to be explicitly stored in nodes. (In the figure, words are shown only to illustrate how the trie works.)
Though it is most common, tries need not be keyed by character strings. The same algorithms can easily be adapted to serve similar functions of ordered lists of any construct, e.g., permutations on a list of digits, permutations on a list of shapes, etc.
Advantages and drawbacks, relative to binary search tree
The following are the main advantages of tries over binary search trees (BSTs):- Looking up keys is faster. Looking up a key of length m takes worst case O(m) time. A BST takes O(log n) time, where n is the number of elements in the tree, because lookups depend on the depth of the tree, which is logarithmic in the number of keys. Also, the simple operations tries use during lookup, such as array indexing using a character, are fast on real machines.
- Tries can require less space when they contain a large number of short strings, because the keys are not stored explicitly and nodes are shared between keys with common initial subsequences.
- Tries help with longest-prefix matching, where we wish to find the key sharing the longest possible prefix of characters all unique.
Applications
As replacement of other data structures
As mentioned, a trie has a number of advantages over binary search trees. A trie can also be used to replace a hash table, over which it has the following advantages:- Looking up data in a trie is faster in the worst case, O(m) time, compared to an imperfect hash table. An imperfect hash table can have key collisions. A key collision is the hash function mapping of different keys to the same position in a hash table. The worst-case lookup speed in an imperfect hash table is O(N) time, but far more typically is O(1), with O(m) time spent evaluating the hash.
- There are no collisions of different keys in a trie.
- Buckets in a trie which are analogous to hash table buckets that store key collisions are only necessary if a single key is associated with more than one value.
- There is no need to provide a hash function or to change hash functions as more keys are added to a trie.
- A trie can provide an alphabetical ordering of the entries by key.
- Tries can be slower in some cases than hash tables for looking up data, especially if the data is directly accessed on a hard disk drive or some other secondary storage device where the random access time is high compared to main memory.
- It is not easy to represent all keys as strings, such as floating point numbers, which can have multiple string representations for the same floating point number, e.g. 1, 1.0, 1.00, +1.0, etc.
- Tries are frequently less space-efficient than hash tables.
- Unlike hash tables, tries are generally not already available in programming language toolkits.
Dictionary representation
A common application of a trie is storing a dictionary, such as one found on a mobile telephone. Such applications take advantage of a trie's ability to quickly search for, insert, and delete entries; however, if storing dictionary words is all that is required (i.e. storage of information auxiliary to each word is not required), a minimal acyclic deterministic finite automaton would use less space than a trie.Tries are also well suited for implementing approximate matching algorithms, including those used in spell checking software.
Algorithms
The following pseudo-code represents the general algorithm for determining whether a given string is in a trie. Note thatchildren is an array of a node's children; and we say that a "terminal" node is one which contains a valid word.
function isKeyInTrie (node, key):
if (key is an empty string): # base case
return is node terminal?
else: # recursive case
c = first character in key # this works because key is not empty
tail = key minus the first character
child = node.children[c]
if (child is null): # unable to recurse, although key is not-empty
return false
else:
return isKeyInTrie (child, tail)
Sorting
Lexicographic sorting of a set of keys can be accomplished with a simple trie-based algorithm as follows:- Insert all keys in a trie.
- Output all keys in the trie by means of pre-order traversal, which results in output that is in lexicographically increasing order, or post-order traversal, which results in output that is in lexicographically decreasing order. Pre-order traversal and post-order traversal are kinds of depth-first traversal. In-order traversal is another kind of depth-first traversal that is more appropriate for outputting the values that are in a binary search tree rather than a trie.
A trie forms the fundamental data structure of Burstsort; currently (2007) the fastest known, memory/cache-based, string sorting algorithm.
A parallel algorithm for sorting N keys based on tries is O(1) if there are N processors and the length of the keys have a constant upper bound. There is the potential that the keys might collide by having common prefixes or by being identical to one another, reducing or eliminating the speed advantage of having multiple processors operating in parallel.
Full text search
A special kind of trie, called a suffix tree, can be used to index all suffixes in a text in order to carry out fast full text searches.See also
- Radix tree
- Directed acyclic word graph (aka DAWG)
- Ternary search tries
- Acyclic deterministic finite automata
- Deterministic finite automata
- Judy array
- Search algorithm
- Extendible hashing
- Prefix Hash Tree
- Burstsort
- Luleå algorithm
External links
- NIST's Dictionary of Algorithms and Data Structures: Trie
- Tries by Lloyd Allison
- An Implementation of Double-Array Trie
- de la Briandais Tree
- Discussing a trie implementation in Lisp
References
- R. de la Briandais: File Searching Using Variable Length Keys. Proceedings of the Western Joint Computer Conference, 1959, pages 295–298.
- E. Fredkin: Trie Memory. Communications of the ACM, 3(9):490-499, Sept. 1960.
- Donald Knuth. The Art of Computer Programming, Volume 3: Sorting and Searching, Third Edition. Addison-Wesley, 1997. ISBN 0-201-89685-0. Section 6.3: Digital Searching, pp.492–512.
- Cache-Efficient String Sorting Using Copying
Computer science, or computing science, is the study of the theoretical foundations of information and computation and their implementation and application in computer systems.
..... Click the link for more information.
..... Click the link for more information.
tree is a widely-used data structure that emulates a tree structure with a set of linked nodes.
..... Click the link for more information.
Nodes
A node may contain a value or a condition or represents a separate data structure or a tree of its own...... Click the link for more information.
data structure is a way of storing data in a computer so that it can be used efficiently. Often a carefully chosen data structure will allow the most efficient algorithm to be used. The choice of the data structure often begins from the choice of an abstract data type.
..... Click the link for more information.
..... Click the link for more information.
An associative array (also map, mapping, hash, dictionary, finite map, lookup table, and in query-processing an index or index file
..... Click the link for more information.
..... Click the link for more information.
string is an ordered sequence of symbols. These symbols are chosen from a predetermined set.
In programming, when stored in memory each symbol is represented using a numeric value.
..... Click the link for more information.
In programming, when stored in memory each symbol is represented using a numeric value.
..... Click the link for more information.
binary search tree (BST) is a binary tree data structure which has the following properties:
..... Click the link for more information.
- Each node has a value.
- A total order is defined on these values.
- The left subtree of a node contains only values less than the node's value.
..... Click the link for more information.
string is an ordered sequence of symbols. These symbols are chosen from a predetermined set.
In programming, when stored in memory each symbol is represented using a numeric value.
..... Click the link for more information.
In programming, when stored in memory each symbol is represented using a numeric value.
..... Click the link for more information.
Etymology is the study of the history of words - when they entered a language, from what source, and how their form and meaning have changed over time.
In languages with a long written history, etymology makes use of philology, the study of how words change from culture to
..... Click the link for more information.
In languages with a long written history, etymology makes use of philology, the study of how words change from culture to
..... Click the link for more information.
tree is a widely-used data structure that emulates a tree structure with a set of linked nodes.
..... Click the link for more information.
Nodes
A node may contain a value or a condition or represents a separate data structure or a tree of its own...... Click the link for more information.
In the theory of computation, a deterministic finite state machine or deterministic finite automaton (DFA) is a finite state machine where for each pair of state and input symbol there is one and only one transition to a next state.
..... Click the link for more information.
..... Click the link for more information.
binary search tree (BST) is a binary tree data structure which has the following properties:
..... Click the link for more information.
- Each node has a value.
- A total order is defined on these values.
- The left subtree of a node contains only values less than the node's value.
..... Click the link for more information.
In computational complexity theory, big O notation is often used to describe how the size of the input data affects an algorithm's usage of computational resources (usually running time or memory).
..... Click the link for more information.
..... Click the link for more information.
logarithm (to base b) of a number x is the exponent y that satisfies x = by. It is written logb(x) or, if the base is implicit, as log(x).
..... Click the link for more information.
..... Click the link for more information.
Longest prefix match refers to an algorithm used by routers in Internet Protocol (IP) networking to select an entry from a routing table.
Because each entry in a routing table may specify a network, one destination address may match more than one routing table entry.
..... Click the link for more information.
Because each entry in a routing table may specify a network, one destination address may match more than one routing table entry.
..... Click the link for more information.
In computer science, a hash table, or a hash map, is a data structure that associates keys with values. The primary operation it supports efficiently is a lookup: given a key (e.g. a person's name), find the corresponding value (e.g. that person's telephone number).
..... Click the link for more information.
..... Click the link for more information.
mobile phone or cell phone is a long-range, portable electronic device used for mobile communication. In addition to the standard voice function of a telephone, current mobile phones can support many additional services such as SMS for text messaging, email, packet switching
..... Click the link for more information.
..... Click the link for more information.
Acyclic deterministic finite automata (ADFA) are deterministic finite automata without cycles. In other words, they can only represent finite sets of strings. They can be used as a data structure for word storage with extremely fast search performance.
..... Click the link for more information.
..... Click the link for more information.
In computing, a spell checker is a software program designed to verify the spelling of words. A spell checker helps a user to ensure correct spelling, while suggesting corrections for wrongly spelled words.
..... Click the link for more information.
..... Click the link for more information.
In computer science, tree-traversal refers to the process of visiting each node in a tree data structure, exactly once, in a systematic way. Such traversals are classified by the order in which the nodes are visited.
..... Click the link for more information.
..... Click the link for more information.
In mathematics, the lexicographic or lexicographical order, (also known as dictionary order, alphabetic order or lexicographic(al) product), is a natural order structure of the Cartesian product of two ordered sets.
..... Click the link for more information.
..... Click the link for more information.
In computer science, tree-traversal refers to the process of visiting each node in a tree data structure, exactly once, in a systematic way. Such traversals are classified by the order in which the nodes are visited.
..... Click the link for more information.
..... Click the link for more information.
In computer science, tree-traversal refers to the process of visiting each node in a tree data structure, exactly once, in a systematic way. Such traversals are classified by the order in which the nodes are visited.
..... Click the link for more information.
..... Click the link for more information.
In computer science, tree-traversal refers to the process of visiting each node in a tree data structure, exactly once, in a systematic way. Such traversals are classified by the order in which the nodes are visited.
..... Click the link for more information.
..... Click the link for more information.
General Data
Class: Search algorithm
Data Structure: Graph
Time Complexity:
Space Complexity: where is the length of the longest simple path in the graph.
..... Click the link for more information.
Class: Search algorithm
Data Structure: Graph
Time Complexity:
Space Complexity: where is the length of the longest simple path in the graph.
..... Click the link for more information.
In computer science, tree-traversal refers to the process of visiting each node in a tree data structure, exactly once, in a systematic way. Such traversals are classified by the order in which the nodes are visited.
..... Click the link for more information.
..... Click the link for more information.
General Data
Class: Search algorithm
Data Structure: Graph
Time Complexity:
Space Complexity: where is the length of the longest simple path in the graph.
..... Click the link for more information.
Class: Search algorithm
Data Structure: Graph
Time Complexity:
Space Complexity: where is the length of the longest simple path in the graph.
..... Click the link for more information.
binary search tree (BST) is a binary tree data structure which has the following properties:
..... Click the link for more information.
- Each node has a value.
- A total order is defined on these values.
- The left subtree of a node contains only values less than the node's value.
..... Click the link for more information.
In computer science, radix sort is a sorting algorithm that sorts integers by processing individual digits. Because integers can represent strings of characters (e.g., names or dates) and specially formatted floating point numbers, radix sort is not limited to integers.
..... Click the link for more information.
..... Click the link for more information.
Burstsort and its variants are cache-efficient algorithms for sorting strings and are faster than quicksort and radix sort for large data sets.
..... Click the link for more information.
References
- Burst Tries: A Fast, Efficient Data Structure for String Keys
..... Click the link for more information.
In computer science, a parallel algorithm, as opposed to a traditional serial algorithm, is one which can be executed a piece at a time on many different processing devices, and then put back together again at the end to get the correct result.
..... Click the link for more information.
..... Click the link for more information.
This article is copied from an article on Wikipedia.org - the free encyclopedia created and edited by online user community. The text was not checked or edited by anyone on our staff. Although the vast majority of the wikipedia encyclopedia articles provide accurate and timely information please do not assume the accuracy of any particular article. This article is distributed under the terms of GNU Free Documentation License.
Herod_Archelaus